Closed GabrielHoffman closed 2 years ago
I implemented a user defined target
argument that seems to work well. I'm doing some more testing, and I can push when its finish
The general idea sounds sensible to me. However, I'm curious whether the raw
things are part of the H5AD standard or not, because that has implications for a few things beyond zellkonverter. For example, the file-backed H5ADMatrix
classes assume that the matrix is either X
or layer/*
, and they may refuse to play ball if the matrix is somewhere else.
Hi Aaron,
1) Is raw
a "standard" attribute in H5AD files?
it looks like the raw
field is supported by anndata here, but I don't have much python experience.
Pegasus uses the raw/X
field to store unnormalized raw counts, and uses X
to store normalized counts. See example. This is how I ran into this issue: my colleage processed our single cell RNA-seq data with pegasus and I'm trying to do some downstream analysis using the raw counts.
Given the wide adoption of Pegasus and its use of raw/X
in H5AD, it seems important to support this field for downstream analyses.
2) Support of raw/X
field by other tools:
Thanks for pointing out that H5ADMatrix
only supports X
or layers/*
. Wider support of raw/X
is certainly important.
However, zellkonverter::readH5AD()
does'nt depend on this class, and so doesn't prevent an isolated improvement. I implemented a new argument zellkonverter::readH5AD(...,target="X")
that is passed to AnnData2SCE()
.
Here is my fork with the change to support other paths.
Best, Gabriel
Thanks for the suggestion and the code! Issue #53 is also about supporting the raw
slot and I'm hoping to squeeze this into the next release.
You should be able to do this now with the devel version, you will just need to set raw = TRUE
in readH5AD()
. This will add an altExp
called "raw"
to the returned SingleCellExperiment
object. See ?altExps
for details about how to use alternative experiments.
@lazappi Thanks for this fix. It works great!
Thanks for developing this package, its been super useful for handling large datasets in R.
I have a H5AD file where the
X
slot stores normalized counts while raw counts are stored inraw/X
. I would like to usereadH5AD()
to read inraw/X
.However it looks like
X
is hard coded inAnnData2SCE
: https://github.com/theislab/zellkonverter/blob/5e928bfa9b205ab1d507fc3893123394a2769f97/R/konverter.R#L106It seems easy enough to add an argument to make this more flexible.
Is it more complicated than that? If I make the change would you want to incorporate it into the main branch?
From a user perspective, I had thought that the
X_name
argument would do this, but it names the assay rather then specifying where the data is.Cheers, Gabriel