scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.91k stars 601 forks source link

Is sctransform available ? #1068

Closed benstaf closed 4 years ago

benstaf commented 4 years ago

Hi, I would like to use sctransform, but I didn't find it in Python: https://github.com/ChristophH/sctransform

LuckyMD commented 4 years ago

Hi, It's not available in scanpy at the moment, but I wrote a wrapper for it via rpy2 and anndata2ri which is available here: https://github.com/normjam/benchmark/blob/master/normbench/methods/ad2seurat.py

SamueleSoraggi commented 4 years ago

scTransform is easily usable if you use rpy2 and anndata2ri. I use directly the vst R function at this address to make it work https://github.com/ChristophH/sctransform/blob/master/R/vst.R

Den søn. 23. feb. 2020 kl. 00.44 skrev MalteDLuecken < notifications@github.com>:

Hi, It's not available in scanpy at the moment, but I wrote a wrapper for it via rpy2 and anndata2ri which is available here:

https://github.com/normjam/benchmark/blob/master/normbench/methods/ad2seurat.py

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/theislab/scanpy/issues/1068?email_source=notifications&email_token=ACC66UMYH2ZHSMFFQS35FRLREG2ENA5CNFSM4KZJFJP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMVNJCY#issuecomment-590009483, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACC66UJ2GVSPUTR4WLWM2V3REG2ENANCNFSM4KZJFJPQ .

LuckyMD commented 4 years ago

Yup.. with the function I linked to it's even easier as no need to specify the commands even ;).

SamueleSoraggi commented 4 years ago

Smart way to do it. Now you inspired me to reorganize my other rpy2 scripts :)

søn. 23. feb. 2020 9.10 PM skrev MalteDLuecken notifications@github.com:

Yup.. with the function I linked to it's even easier as no need to specify the commands even ;).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/theislab/scanpy/issues/1068?email_source=notifications&email_token=ACC66ULT7GUJ42DF3DP6Q2DRELJ3VA5CNFSM4KZJFJP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMWF6OI#issuecomment-590110521, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACC66UJ5BJ7IYNEUYGXYGOTRELJ3VANCNFSM4KZJFJPQ .

LuckyMD commented 4 years ago

Maybe it would be a good idea to have a separate repo of rpy2 and anndata2ri wrappers for R methods that we want to run in scanpy workflows. Would you be interested in sth like that? I could create a separate repo in theislab github? Something like www.github.com/theislab/Rforscanpy?

ivirshup commented 4 years ago

@LuckyMD, that sounds like it would be really useful. A few questions:

LuckyMD commented 4 years ago

Hey!

* What methods/ tools?

I am mainly thinking about normalization and data integration methods. For example scran pooling, sctransform, scNorm, Seurat data integration, LIGER... etc. I have most of those already... But anyone is welcome to contribute for anything they regularly use.

* How would you handle R depencies?

So far I've been ignoring this problem and just assuming people have an R environment installed that has the relevant packages. You could just stick a require(package) in the function called by rpy2 and then if would give you an R error you can interpret. The plan would be to make this a set of convenience functions, but not a cleanly installable module I guess... I'm not sure how you could get any python setup to install R dependencies for you...

* And (probably hard and definitely not necessary at first) could we use [arrow](https://arrow.apache.org/docs/python/) to speed up data transfer?

This looks interesting... but I don't entirely understand it... you'd have to have a a separate data structure that can move been languages, and be interpreted as an R data structure or AnnData depending on where it's used? Most methods are designed to run on a particular class of object. How would this help if you always have to convert to that type of object? So far I've just been using anndata2ri to ensure we have an SCE object which can be converted to other R data structures.

ivirshup commented 4 years ago

I'm not sure how you could get any python setup to install R dependencies for you

Maybe a conda package could include dependencies? I think getting a working environment would alleviate a large pain point for this stuff (for example, I currently have no working Seurat install.). Plus making sure packages are up to date for the wrapped functionality.

you'd have to have a a separate data structure that can move been languages

Sort of. The idea is that you could move arrays to R from python without making any copies, they'd just point to the same memory. This is already possible when passing data from R to python. The main idea is making these wrappers faster and take less memory.

LuckyMD commented 4 years ago

The issue with going through conda is that not all R packages are on bioconda (e.g. Conos). And I'm not keen to create and maintain a conda R package. Therefore I'm using a conda environment with some python packages installed on top via pip and some R packages installed via install.packages().

The idea is that you could move arrays to R from python without making any copies, they'd just point to the same memory.

I'm guessing this is not what already happens in rpy2?

ivirshup commented 4 years ago

I'm not keen to create and maintain a conda R package.

That's fair. Might be worth asking the conos developers in this case?

Also, does using install.packages within a conda environment work for you? I recall that not working well for me in the past.

I'm guessing this is not what already happens in rpy2?

Nah, rpy2 even copies the data in a particularly slow way by default.

LuckyMD commented 4 years ago

That's fair. Might be worth asking the conos developers in this case?

Yes, could and should do this... but would slow down the process for now I guess.

Also, does using install.packages within a conda environment work for you? I recall that not working well for me in the past.

It works if you install the R packages last and don't install anything else over the top via conda.

gokceneraslan commented 4 years ago

Hi all,

Sorry I sent a PR(https://github.com/theislab/scanpy/pull/1271) without reading any of these, it's my bad. Some thoughts are as follows:

jnmark commented 3 years ago

Hi, It's not available in scanpy at the moment, but I wrote a wrapper for it via rpy2 and anndata2ri which is available here: https://github.com/normjam/benchmark/blob/master/normbench/methods/ad2seurat.py

Hi,

I have been trying to use this wrapper, but seems like there's some error during the conversion process:

RRuntimeError: Error in validObject(.Object) : invalid class “dgCMatrix” object: 1: invalid object for slot "i" in class "dgCMatrix": got class "array", should be or extend class "integer" invalid class “dgCMatrix” object: 2: invalid object for slot "p" in class "dgCMatrix": got class "array", should be or extend class "integer" invalid class “dgCMatrix” object: 3: invalid object for slot "Dim" in class "dgCMatrix": got class "array", should be or extend class "integer" invalid class “dgCMatrix” object: 4: invalid object for slot "x" in class "dgCMatrix": got class "array", should be or extend class "numeric"

Any pointers to get around this?

LuckyMD commented 3 years ago

Hey!

I think this is probably related to https://github.com/theislab/anndata2ri/issues/63. Maybe try downgrading your anndata2ri version.

Mayank0512 commented 2 years ago

Hello, I am trying to use the wrapper class and I am getting error RRuntimeError: Error in [.data.frame(meta.data, , ii, drop = FALSE) : undefined columns selected Could you please suggest me what should I do Its on line ro.r('seurat_obj = as.Seurat(adata, counts="X",data=NULL)') Thank you