theislab / moscot

Multi-omic single-cell optimal transport tools
https://moscot-tools.org
BSD 3-Clause "New" or "Revised" License
109 stars 9 forks source link

Naming (there's another scot...) #4

Closed Marius1311 closed 3 years ago

Marius1311 commented 3 years ago

@michalk8 caught this: https://rsinghlab.github.io/SCOT/

It's called SCOT and it does Gromov-Wasserstein for data integration. We need to change the name of this package (let's discuss here) and see what we can take from their implementation. Theirs is POT based.

zoepiran commented 3 years ago

Are we really only concerned with the naming ..? general directions: (i) we can add "multi omics" but moscot(t) may by too similar to scot and muscat .. (ii) adding GW

Marius1311 commented 3 years ago

This issue is about naming, yes. Re i, I agree, re ii, how would you add that?

Going through their preprint, I think they have a very nice summary of OT and GW-OT, potentially interesting for @michalk8 to get a quick overview. However, I think their actual code is a bit messy.

Methodologically, they include unbalanced GW-OT, which seems easy, they just change the Sinkhorn solver they use in the loop. Using unbalanced OT is important when you expect to see very different group (cluster) distribution across samples, see e.g. Fig. 10.1 in https://arxiv.org/abs/1803.00567

Also, they make an argument about scalability of their algorithm, see Fig. 1 below (taken from their preprint). Doesn't seem like it scales particularly well to me, takes ~half an hour on 5k cells (per metric space -> time point)

Fig. 1

Screenshot 2021-09-07 at 15 07 48
Marius1311 commented 3 years ago

Their preprint doesn't actually consider the unbalanced case, they added this in a recent release (see Alg. 1 below from their preprint). The Algorithm is very basic GW-OT with a trick to rewrite the 4th order distance tensor for L2 distances from http://proceedings.mlr.press/v48/peyre16.pdf that NovoSparc uses as well I think.

Alg. 1

Screenshot 2021-09-07 at 15 14 12
zoepiran commented 3 years ago

Yes, novosparc uses it as well and indeed looks pretty simple. (ii) -> the problem is that `GW' is a tough one :( .. thinking ..

Marius1311 commented 3 years ago

Actually, I think it would be good to treat this package similar to scVI-tools - it's the framework which defines the basic class structure, how we interact with AnnData objects and OTT (the backend). So we should give one name to the framework (something like XXX-tools) and then name individual models i.e. the lineage tracing model which we're working on etc.

Marius1311 commented 3 years ago

So the framework shouldn't have GW in the name, our specific model can (but doesn't need to).

zoepiran commented 3 years ago

So the game is with- single cell (sc), multi omics (mo), optimal transport (ot) and ..?

Marius1311 commented 3 years ago

tools, framework, toolkit, python

Marius1311 commented 3 years ago

@zoepiran suggested moscot (= multi-omic single-cell optimal transport tools), which is a glasses brand, and I really like it! We could then use the glasses in our logo, I'm thinking of two piles of cells, one in each side of the glasses. What do you think @michalk8 ?

michalk8 commented 3 years ago

Like moscot the best, here's what I came up with:

Marius1311 commented 3 years ago

Thanks! BTW, best way to search for already existing tools is https://www.scrna-tools.org/table

Marius1311 commented 3 years ago

I like moscot the best because I can imagine a logo related to it

michalk8 commented 3 years ago

closed via #5