Closed LTLA closed 4 years ago
Yeah I'm not entirely sold on the name either. It's German for "cell converter" which is the best I could come up with. Couldn't think of any good Python/cell puns.
Thanks for the comments, that's really helpful! I think I understand most of them. I will try and make those changes and see how that goes.
I think I addressed the first two points but haven't quite fixed the part in read.R
. I'm getting an error from parallel::makePSOCKcluster()
which doesn't seem related to anything else so I guess I'll have to try and sort that out first.
Just to be clear, this was what I was proposing for read.R
:
readH5AD <- function(file) {
proc <- basilisk::basiliskStart(anndata_env)
on.exit(basilisk::basiliskStop(proc))
file <- path.expand(file)
basilisk::basiliskRun(proc, fun=.converter, file=file)
}
.converter <- function(file) {
anndata <- reticulate::import("anndata")
adata <- anndata$read_h5ad(file)
AnnData2SCE(adata)
}
I would help in testing but I'd need a SCE2AnnData
utility to go the other way. Probably slap some examples together with some of the smaller datasets in scRNAseq and I can play around with it.
I have written basic SCE2AnnData()
and WriteH5AD()
functions but no examples. Should probably start a vignette at some point and need some test datasets...
I have written the reader/writer functions to call an internal function but not sure that is enough for things to work with environment sharing turned off. Need to play around with that a bit more.
It seems to work well but we should add some tests that run with those flags on.
Can't say I get the name, but whatever. Anyway, here's my comments:
basilisk.R
You should consider exporting a character vector that defines the set of Python packages you're using. This will allow downstream R packages to easily specify the same dependencies in their calls to
BasiliskEnvironment
, thus using your package to pin the versions.Note that, for
configure
to work, this variable needs to be defined inbasilisk.R
itself.Also, while not strictly necessary, it is probably good practice to pin the versions of as many dependencies as you can. Probably a quick
conda env list
will tell you what to add.konverter.R
This is what I wanted but you should note that this is not a function that is intended to be run by end-users. In general, it is not valid to pass reticulate objects into
basiliskRun
calls as there is no guarantee that basilisk will use the same Python process as the one containing those objects. This limitation is necessary for basilisk to guarantee that the code will work, even if you have an incompatible Python process loaded into your current R session.Rather, the intended use case is for other developers to call
zellkonverter:: adata2SCE
inside theirbasiliskRun
function bodies, on theadata
objects they created. This would save everyone from having to re-define the conversion function. It is also the motivation behind the version control suggested forbasilisk.R
as it ensures we are all talking about the same version of AnnData.For this purpose,
adata2SCE
does not need to protect itself withbasiliskStart
, as this should be handled by the calling context. Might also want to slap a dot in front to indicate it's for devs.read.R
This looks fine but note that
adata2SCE
may not be available if a separate process is loaded. The simple solution is to callzellkonverter:: adata2SCE
but this will trigger CHECK warnings about not needing to namespace functions from the same package. The better solution is to movefun
into an internal function and then call that internal function frombasiliskRun
, in which case the package namespace is automatically loaded in the child process.A good smoke test is to set
setBasiliskShared(FALSE)
andsetBasiliskFork(FALSE)
and see whether any of your functions still work. This will trigger the creation of entirely independent processes to execute the function passed tobasiliskRun
, so if your code relied on shared memory or contexts, it will fail hard.