Closed dkobak closed 1 year ago
I like this, but perhaps instead of "average" and "or", I would prefer "mean", and "max", which (to me) feel more standard. If we do go this route, I would also change the default parameter value from symmetrize=True
to symmetrize="max"
. Do you think it would make more sense to default to the binary affinity kernel, or the current "mean" symmetrization?
Makes sense, I changed it to "mean" and "max". Regarding the default value, I talked to @sdamrich and he suggested we use "max" as default, because that results in a conceptually simpler affinity matrix which can be seen as uniform on the "skNN" ("symmetrized kNN") graph. I don't have a strong opinion here myself. So I set "max" to default (but kept symmetrize=True
as a possible input option that also defaults to "max"; symmetrize=False
performs no symmetrization).
Note that this is going to change the default behaviour of this class, but I guess not many people will be affected.
I hate to be nit-picky, but could you please add a case if we get a string that isn't either "mean" or "max" and raise a ValueError
with something like symmetrization {method} is not recognized
. Just so we don't fail silently in case the user makes a typo or something like that.
Note that this is going to change the default behaviour of this class, but I guess not many people will be affected.
At first, I didn't really have any qualms about this, since probably not many people rely on this. But on second thought, it would probably be better to have a version with the same default as before, so "mean", but have a FutureWarning indicating that the default will change in the next version. So the roadmap might be 0.7.2 with the same behaviour as now, but with a warning, then I would change the default to "max" in the next version. Even though this change probably wouldn't affect many people, I don't want to get in this habit of assuming people don't use it, then we inadvertently break something.
Makes sense. I changed it so that the default is True (that's how it was before) and this behaves like "mean" but triggers a FutureWarning.
This is a very minor suggestion, but I realized some time ago that in some papers we implement "uniform affinities" slightly differently: not as
(P + P.T) / 2
but as(P + P.T > 0).astype(float)
whereP
is the kNN adjacency matrix. The only difference is that the former has two possible non-zero values, whereas the latter has only one possible non-zero value, which is kind of neat.This does not really make much of a different in practice, but I thought it may be nice to implement it in
Uniform()
usingsymmetrize="average"
andsymmetrize="or"
, in addition tosymmetrize=True/False
which this class has anyway.