scverse / anndata

Annotated data.
http://anndata.readthedocs.io
BSD 3-Clause "New" or "Revised" License
558 stars 150 forks source link

Feature request: Trajectory fits #121

Open LuckyMD opened 5 years ago

LuckyMD commented 5 years ago

Hi,

I was just discussing the AnnData to R conversion that @flying-sheep is developing, and we noticed that there is no support for trajectory fits in AnnData. For conversions between R and python, it would be helpful to have a standard way to store e.g., spline fits in reduced dimensional representations that Slingshot produces.

One of the main shortcomings of scanpy compared to what is available in R is the limited availability of trajectory inference methods. DPT and PAGA are available, and neither use fits to visualize trajectories. To allow python users to better integrate with TI methods from R, it would be good to determine a standard for storing this data.

falexwolf commented 5 years ago

Right, we don't have this! I'd say it's not something that is urgently lacking (this is the first time it's requested), but it would be nice-to-have. Do you have a suggestion and would potentially go forward with a pull request? The main problem that I see is that implementing, say, a plotting function within Scanpy for a result obtained with Slingshot, is a bit strange. Hence, I wonder what one would do with the information if it was possible to carry over from R to anndata?

LuckyMD commented 5 years ago

Agreed... this is a nice-to-have at the moment. I reckon that it will probably become more important as TI methods improve and stick with R as the language of choice... and as moving between R and python gets easier.

We came across this problem as with @flying-sheep's anndata2ri package, information is lost when converting between R and python if the slots of the R object are not (and cannot) be converted to the anndata object. That means, if I have an AnnData object, I want to fit a Slingshot trajectory, and then store the data as an AnnData object again, the fit information is lost. On top of that, plotting fits from various TI methods in scanpy would be pretty cool.

Unfortunately I don't really know how model fits are normally stored in python, as I've only ever done this in R. If there is a canonical way to store fit information, one could convert whatever glm object R outputs to a python fit object. That would allow us to convert any type of GLM from R into python without considering method specific data structures once in AnnData (this would however have to be dealt with in the conversion). Basically, this would unlock all TI methods in R for scanpy users together with anndata2ri.

The simplest idea I had, was just to store something in .uns with the representation used for the fit, and then a 2D matrix with points on the curve in very small distances. But then, we'd ideally have residuals, the formula used, coefficients, with standard errors and p-values.

falexwolf commented 5 years ago

Thanks for the explanation! You're simplest idea makes sense, but you're alluding that a comprehensive solution seems like a lot of work. :) So, this is not a small project and we'd need someone to do this.

LuckyMD commented 5 years ago

Exactly... Unless you know exactly how model fits are canonically stored in python (if there is a canonical way like in R's glm objects), then it's a bigger project. Would this qualify as a Master's project for a computer science student? Maybe together with a comparison of TI methods like the Saelens review? Or is this more HiWi stuff?

falexwolf commented 5 years ago

Hm, no I don't know how this is canonically stored in python. statsmodels might have a solution, but I don't know it very well.

Yes, could be both, Hiwi or Master project, depending on how deep you go. There is definitely interesting things one can say about it.