Open Zethson opened 10 months ago
Seems reasonable. No immediate problems this could cause come to my mind. And by having a function per metric we could add more docs including formulas, which I agree is nice.
Two issues come to mind:
The obvious - calling a set of distances one after another would look more ugly. For example, right now I can call
for metric in metrics:
distance = pt.tl.Distance(metric=metric)
but with the proposed change, I would call
for metric in metrics:
distance = func_dict[metric](mode='onesided')
Where does from_precomputed
go? In the use case right now using the same distance object above, you would have called precompute_distances
on an adata and then using the distance __call__
or either of .onesided(X,Y)
or .pairwise(X, Y)
would have made use of the precomputed distances:
distance = pt.tl.Distance(metric='wasserstein')
distance.precompute_distances(adata)
df = distance(adata, groupby, etc.)
In the proposed implementation, you would
distance=pt.tl.Distance.compute_wasserstein(mode='precompute')
distance(adata)
distance=pt.tl.Distance.compute_wasserstein(mode='pairwise') # using pairwise as an example, also where it makes the most sense
df = distance(adata, groupby, etc.)
In my opinion, this is considerably less readable and not intuitive. It also doesn't just apply to precompute
but also to the case in which you want to calculate any summary statistic beforehand, which is what we definitely want to do because that's a major speedup.
Only matters if you implement it this way, but it should be just wasserstein
and not compute_wasserstein
.
And just to clarify, you're thinking of using it like
distance=pt.tl.Distance.compute_wasserstein(mode='pairwise')
df = distance(adata, groupby, etc.)
NOT
distance=pt.tl.Distance()
df = distance.compute_wasserstein(mode='pairwise')(adata, groupby, etc.)
Right?
Discussed a few things with @yugeji
__call__
and only eat numpy arraysFor future distances that do not use what is currently the standard __call__
format of (X, Y), implementing it in the new way would let you override onesided
with a distance-specific one (for example, with classifier class projection or KNN distance).
An important addition to this refactor (which would also allow classifier_cp
to be used with pairwise
) would be to make pairwise
include calls to onesided
instead of using the copy-pasted code which is happening right now (and which is also causing problems).
Description of feature
This is a continuation of https://github.com/theislab/pertpy/issues/405 but specific for
Distance
.TLDR: Currently,
Distance
does not adhere to the API design of the rest of pertpy and I want to harmonize it. Currently, we pass ametric
to the constructor which then uses the appropriate distance function on__call__
. This comes with two issues:Currently we also have the
onesided_distances
pairwise
precompute_distances
functions.
Moving
metric
into these 3 functions wouldn't really help or solve any issue. The only option I see is having functions like:distance.compute_wasserstein(mode=Literal'onesided', 'pairwise', 'precompute'])
for all of the metrics. These would then show up in a table of functions and can be documented more easily. It would also probably correspond better with the current design.What do you think? I'm especially interested in @yugeji, @stefanpeidli, and @tessadgreen opinion.