zsteve / StationaryOT

Dynamic inference from single-cell snapshots by optimal transport
MIT License
15 stars 3 forks source link

CellRank integration #1

Closed Marius1311 closed 3 years ago

Marius1311 commented 3 years ago

Hi there, I was really happy to see the CellRank interface you guys created! Does this work yet/is there anything we can do from our side to further push this integration?

zsteve commented 3 years ago

Hi Marius, Thanks for your interest! Still working on getting it fully functional, e.g. with some of the downstream analysis functionality offered by CellRank. At the moment I'm mostly occupied with other things, but I'm planning to look further in this direction in the coming weeks. Would be great if I could ask some questions about the CellRank API.

Stephen

Marius1311 commented 3 years ago

Hi Stephen, absolutely, I'm more than happy to help here, just let me know when something comes up.

Marius1311 commented 3 years ago

Congrats to the preprint!

Marius1311 commented 3 years ago

Regarding integration into CellRank, I had the following idea: we could create a cellrank.external module, similar to how this is handled in scanpy. In this external API, we could expose your statOT kernel so it can be used conveniently form within CellRank, again similarly to how this is done in scanpy. We would of course explicitly reference statOT in the docstring and link to your package for documentation and support. Accessing statOT in this way would still require users to install your statOT package, hence it would only benefit your download numbers.

From your side, it would have the advantage that it would increase visibility & download numbers, plus access to some of CellRank's efficient downstream functionality, e.g. our implementation of abs. probabilities, which uses PETSc's GMRES implementation parallelised across subproblems, which allows us to compute abs. probs for 100k cells in 2 min without GPU acceleration.

zsteve commented 3 years ago

Hi Marius,

Congrats to the preprint!

Thanks! Hopefully StationaryOT can offer some complementary functionality to CellRank.

Regarding integration into CellRank, I had the following idea: we could create a cellrank.external module, similar to how this is handled in scanpy.

Yes I agree that sounds like a good way to interface the two methods (at least tentatively -- am yet to mention this to my coauthors). I suppose the idea would be that we can use CellRank in cases where RNA velocity seems too noisy or may be not so applicable. Would you suggest something the lines of exposing the OTKernel in the current StationaryOT code to the CellRank internals?

Marius1311 commented 3 years ago

Hi @zsteve, exactly, that's it, in extends CellRank to situations where RNA velo currently isn't applicable. At the same time, it makes it really easy for everyone already used to the CellRank API to use StationaryOT, i.e. it would also generate more users for StationaryOT.

Re integration, did you already check out how this is done in scanpy external? Basically, the idea would be to have something very similar where there is a shallow wrapper in CellRank that imports and calls the kernel that you already have implemented in StationaryOT.

zsteve commented 3 years ago

Hi @Marius1311 , Makes sense! Yes I've had a look at the integration in scanpy.external, and I agree. One distinction here which might be relevant is that our method doesn't work on a kNN. At least in the case of using quadratic regularised OT (which in our opinion is the better choice in practice), we still get a sparse graph that is uncovered by the OT problem itself. For the entropic regularisation, the resulting graph is dense (and hence less ideal in practice, both in terms of storage and compute time)

I know that CellRank uses sparse matrices throughout, and so it might be most relevant to have compatibility for quadratic regularisation. In that case, the StationaryOT kernel could provide the transition matrix in sparse format to CellRank?

The current .cr submodule is a little out of date, but I'll have it on my todo list to revisit it in the next day or so. The code at the moment is based off my attempt to unravel CellRank's internals and probably doesn't do everything in the best way, so feel free to take a look and let me know if parts can be improved or rewritten.

Marius1311 commented 3 years ago

Hi @zsteve, that makes sense - let's prioritize the quadratic regularized OT for the integration then. We'll set up a prototype for the integration and link the PR here so you can review it and give us feedback! Does that sound good to you?

zsteve commented 3 years ago

Yes, that sounds good! Looking forward to hearing back.

Marius1311 commented 3 years ago

Fantastic work, thanks everyone.