Closed benstaf closed 4 years ago
Hi, It's not available in scanpy at the moment, but I wrote a wrapper for it via rpy2
and anndata2ri
which is available here:
https://github.com/normjam/benchmark/blob/master/normbench/methods/ad2seurat.py
scTransform is easily usable if you use rpy2 and anndata2ri. I use directly the vst R function at this address to make it work https://github.com/ChristophH/sctransform/blob/master/R/vst.R
Den søn. 23. feb. 2020 kl. 00.44 skrev MalteDLuecken < notifications@github.com>:
Hi, It's not available in scanpy at the moment, but I wrote a wrapper for it via rpy2 and anndata2ri which is available here:
https://github.com/normjam/benchmark/blob/master/normbench/methods/ad2seurat.py
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/theislab/scanpy/issues/1068?email_source=notifications&email_token=ACC66UMYH2ZHSMFFQS35FRLREG2ENA5CNFSM4KZJFJP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMVNJCY#issuecomment-590009483, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACC66UJ2GVSPUTR4WLWM2V3REG2ENANCNFSM4KZJFJPQ .
Yup.. with the function I linked to it's even easier as no need to specify the commands even ;).
Smart way to do it. Now you inspired me to reorganize my other rpy2 scripts :)
søn. 23. feb. 2020 9.10 PM skrev MalteDLuecken notifications@github.com:
Yup.. with the function I linked to it's even easier as no need to specify the commands even ;).
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/theislab/scanpy/issues/1068?email_source=notifications&email_token=ACC66ULT7GUJ42DF3DP6Q2DRELJ3VA5CNFSM4KZJFJP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMWF6OI#issuecomment-590110521, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACC66UJ5BJ7IYNEUYGXYGOTRELJ3VANCNFSM4KZJFJPQ .
Maybe it would be a good idea to have a separate repo of rpy2
and anndata2ri
wrappers for R
methods that we want to run in scanpy workflows. Would you be interested in sth like that? I could create a separate repo in theislab github? Something like www.github.com/theislab/Rforscanpy
?
@LuckyMD, that sounds like it would be really useful. A few questions:
Hey!
* What methods/ tools?
I am mainly thinking about normalization and data integration methods. For example scran pooling, sctransform, scNorm, Seurat data integration, LIGER... etc. I have most of those already... But anyone is welcome to contribute for anything they regularly use.
* How would you handle R depencies?
So far I've been ignoring this problem and just assuming people have an R environment installed that has the relevant packages. You could just stick a
require(package)
in the function called byrpy2
and then if would give you anR
error you can interpret. The plan would be to make this a set of convenience functions, but not a cleanly installable module I guess... I'm not sure how you could get any python setup to install R dependencies for you...* And (probably hard and definitely not necessary at first) could we use [arrow](https://arrow.apache.org/docs/python/) to speed up data transfer?
This looks interesting... but I don't entirely understand it... you'd have to have a a separate data structure that can move been languages, and be interpreted as an R data structure or
AnnData
depending on where it's used? Most methods are designed to run on a particular class of object. How would this help if you always have to convert to that type of object? So far I've just been usinganndata2ri
to ensure we have anSCE
object which can be converted to otherR
data structures.
I'm not sure how you could get any python setup to install R dependencies for you
Maybe a conda package could include dependencies? I think getting a working environment would alleviate a large pain point for this stuff (for example, I currently have no working Seurat install.). Plus making sure packages are up to date for the wrapped functionality.
you'd have to have a a separate data structure that can move been languages
Sort of. The idea is that you could move arrays to R from python without making any copies, they'd just point to the same memory. This is already possible when passing data from R to python. The main idea is making these wrappers faster and take less memory.
The issue with going through conda is that not all R
packages are on bioconda
(e.g. Conos). And I'm not keen to create and maintain a conda R
package. Therefore I'm using a conda environment with some python packages installed on top via pip
and some R
packages installed via install.packages()
.
The idea is that you could move arrays to R from python without making any copies, they'd just point to the same memory.
I'm guessing this is not what already happens in rpy2
?
I'm not keen to create and maintain a conda R package.
That's fair. Might be worth asking the conos
developers in this case?
Also, does using install.packages
within a conda environment work for you? I recall that not working well for me in the past.
I'm guessing this is not what already happens in rpy2?
Nah, rpy2
even copies the data in a particularly slow way by default.
That's fair. Might be worth asking the
conos
developers in this case?
Yes, could and should do this... but would slow down the process for now I guess.
Also, does using
install.packages
within a conda environment work for you? I recall that not working well for me in the past.
It works if you install the R packages last and don't install anything else over the top via conda.
Hi all,
Sorry I sent a PR(https://github.com/theislab/scanpy/pull/1271) without reading any of these, it's my bad. Some thoughts are as follows:
I think it's fairly straightforward to check for R dependencies in runtime, please see the PR for more info.
For Travis, I used Ubuntu packages for base R installation and then rest of the R deps are installed by the Travis user in home directory, which is cached. apt-install R installation takes around a minute. This is really hard to reduce, I think.
After the caching, the installation of sctransform itself take around 15-20sec. This can even be reduced to zero if I check whether it's already installed. See https://travis-ci.org/github/theislab/scanpy/jobs/697070834 for a better breakdown. You can compare this with an existing test run e.g. https://travis-ci.org/github/theislab/scanpy/jobs/696758553.
sctransform test overhead is around 30sec, which can also be reduced. Overall, it adds 4 minutes to the travis test time. I don't know exactly where the remaining difference comes from.
However, if we keep adding more Ubuntu and/or R packages in the scanpy travis, it can get a bit bloated. Even if things are cached, for some reason, there is a 45-50 second cache upload overhead which is not negligible.
Hi, It's not available in scanpy at the moment, but I wrote a wrapper for it via
rpy2
andanndata2ri
which is available here: https://github.com/normjam/benchmark/blob/master/normbench/methods/ad2seurat.py
Hi,
I have been trying to use this wrapper, but seems like there's some error during the conversion process:
RRuntimeError: Error in validObject(.Object) : invalid class “dgCMatrix” object: 1: invalid object for slot "i" in class "dgCMatrix": got class "array", should be or extend class "integer" invalid class “dgCMatrix” object: 2: invalid object for slot "p" in class "dgCMatrix": got class "array", should be or extend class "integer" invalid class “dgCMatrix” object: 3: invalid object for slot "Dim" in class "dgCMatrix": got class "array", should be or extend class "integer" invalid class “dgCMatrix” object: 4: invalid object for slot "x" in class "dgCMatrix": got class "array", should be or extend class "numeric"
Any pointers to get around this?
Hey!
I think this is probably related to https://github.com/theislab/anndata2ri/issues/63. Maybe try downgrading your anndata2ri
version.
Hello,
I am trying to use the wrapper class and I am getting error
RRuntimeError: Error in [.data.frame
(meta.data, , ii, drop = FALSE) :
undefined columns selected
Could you please suggest me what should I do
Its on line ro.r('seurat_obj = as.Seurat(adata, counts="X",data=NULL)')
Thank you
Hi, I would like to use sctransform, but I didn't find it in Python: https://github.com/ChristophH/sctransform