theislab / moscot

Multi-omic single-cell optimal transport tools
https://moscot-tools.org
BSD 3-Clause "New" or "Revised" License
101 stars 9 forks source link

Feat/lin spatial map #722

Open Marius1311 opened 3 days ago

Marius1311 commented 3 days ago

This introduces a LinearSpatialMapping problem which is mean for mapping in spatially-aware latent spaces (think about e.g. NicheCompass latent spaces). Thus, spatial information is already contained in latent dimensions and does not need to be supplied externally via a quadratic term. This reduces time complexity, memory complexity, and the number of tunable parameters.

The current implementation is naive, I just copied the SpatialMapping problem and adapted the prepare and solve methods. What would be a better abstraction in your opinion @giovp? We could do it like you @MUCDK did for the Temporal/Lineage problems, i.e. have the quadratic problem inherit from the linear one, or we could also do it the other way round. Do you prefer any of these possibilities? Or something else?

This PR also introduces a groupby parameter for gene correlation computation, which is useful when you're interested in correlation over specific groups, like clusters etc.

I'll add tests once I hear your general thoughts on this - I'm happy to adapt this to your requirements.

MUCDK commented 3 days ago

Thanks @Marius1311 ! Let's go step by step. To what extent is this necessary given that we have the SinkhornProblem? How common is this use case?

giovp commented 2 days ago

agree with @MUCDK , what about making the mapping problem be arbitrarily also used in pure linear setting? I like the groupby addition! could we maybe have that in as separate PR?

Marius1311 commented 2 days ago

Thanks @Marius1311 ! Let's go step by step. To what extent is this necessary given that we have the SinkhornProblem? How common is this use case?

It's necessary as you don't get the relevant downstream analysis with the SinkhornProblem. I don't think this usage is very common yet, but I need this as part of a method I'm currently developing.

So far, I solved a generic Sinkhorn problem and then manually transferred the solution into a SpatialMapping Problem for the downstream analysis. But that's not very convenient, and I'm not sure how efficient this is, given that the solution to a LinearProblem can be queried batch-wise, so you have a much smaller memory footprint for downstream analysis, like imputation or correlation computation. If I'm understanding things correctly, then batch-wise evaluation of the mapping is not possible for a (full-rank) quadratic problem.

Keep in mind that my indented use-case here is full-rank, not low-rank (with a Linear Problem, you can map 0.5M cells in <2h on a 40GB A100, so you get very far with full rank, as we show in the TOME application).

Marius1311 commented 2 days ago

agree with @MUCDK , what about making the mapping problem be arbitrarily also used in pure linear setting? I like the groupby addition! could we maybe have that in as separate PR?

Re 1, I agree, that would be another possibility, although I think it's cleaner to have two separate problems as you might want to have different default values etc. Also, I think it would be more user-friendly to have two separate problems as users of the Linear Problem would only be interested in a subset of the parameters required for the Quadratic problem.

Re 2, sure, I can move the groupby addition into a separate PR.