Closed GabrielHoffman closed 2 years ago
Hi @GabrielHoffman
I think I might know what's going on here but I need to look into it more. Do you have an example file you would be happy to share? A small subset of this data would be perfect.
After a lot of digging and working with my colleague, I figured out it's an issue with the way the H5AD was generated with pegasus. Instead of using the standard raw
field, using a custom field name causes the raw counts to be saved to layers/customField
. The custom field was name raw_new
so I didn't realize that it was non-standard.
Here is some code to reproduce the H5AD file with this issue.
import pegasus as pg
import pandas as pd
# wget https://storage.googleapis.com/terra-featured-workspaces/Cumulus/MantonBM_nonmix_subset.zarr.zip
data = pg.read_input('MantonBM_nonmix_subset.zarr.zip')
pg.identify_robust_genes(data)
# Transform counts, but retain original in backup_matrix
# The default is raw.X which saves to h5ad_file/raw like we expect
# When backup_matrix is set to a custom value, the result is saved in layers/customField
# This is what causes the problem, since h5ad_file/raw is no longer written
# https://pegasus.readthedocs.io/en/stable/api/pegasus.log_norm.html#pegasus.log_norm
pg.log_norm(data, backup_matrix = 'raw_new')
pg.write_output( data, "out.h5ad")
Based on this, it's not a zellkonverter
issue. It can be resolved by using standard field names, since only standard fields are read as a DelayedMatrix
.
Agree?
Best, Gabriel
Thanks! I still need to test things but I think that makes sense. We should actually be able to support DelayedMatrix
in any layers
item but there was an upstream issue which meant we turned it off. That should be fixed now and #50 is about turning it back on but we didn't get around to it before the last release. Now that it looks like at least some people want to use it it should be more of a priority though.
Hi!
This is just a longer 👀 message ;)
@nick-eagles @abspangler13 and I are going to be using some of the same files Gabriel is using, and so, we will have the same issues Gabriel described. Thank you Gabriel et al for spearheading this and thanks Luke for your support! From our side, Nick is the one who has R and Python experience, and thus has been our in house zellkonverter
expert =) Please let us know if we can help in any way.
Best, Leo
The main thing that would be helpful would be an example .h5ad
file where this is an issue (I don't use pegasus so can't run the code above). Apart from that I need to look into turning the DelayedArray
support for layers other than X
back on.
We have worked out the issue on our end, by writing to raw
, so readH5AD()
works aspected.
Thanks!
Thanks Gabriel, Prashant, @nick-eagles et al for figuring this one out!
Thanks again Luke for the support ^^.
I have an H5AD file that stores both normalized data and raw counts produced by pegasus. I can use
zellkonverter
to read the default normalized counts as aDelayedMatrix
, but the raw counts are imported as adgCMatrix
. How can I use aDelayedMatrix
instead?This follows up on our conversion in #57, but applied to the new H5AD format.
As for the details, I have an H5AD file with the structure:
where
X
stores normalized data andlayers/raw_new
stores the raw counts.I read the data in using:
The
raw_new
field is a 12GbdgCMatrix
.I have
zellkonverter v1.7.0
,Using anndata version 0.8.0
Cheers, Gabriel