Closed haratoru closed 1 year ago
in either cell or feature annotations, one of the columns is an incorrect length. v3 captures these inconsistencies (since we use bioc dataframes. can you check what these looks like in Python?
As @jkanche says, it would be helpful to see what the structure of your H5AD file is. If you're using R, you could report:
rhdf5::h5ls("my_file.h5ad")
The Python equivalent would be something like:
import h5py
handle = h5py.File("my_file.h5ad")
handle.visititems(print)
Thanks for your comment, I really appreciate your help. I have checked my data. Now I think "var/gene_ids" and/or "indptr": shape (6643,) would make the error. I am change those part, and I will try again.
X <HDF5 group "/X" (3 members)> X/data <HDF5 dataset "data": shape (21301761,), type "<f4"> X/indices <HDF5 dataset "indices": shape (21301761,), type "<i4"> X/indptr <HDF5 dataset "indptr": shape (6643,), type "<i4"> layers <HDF5 group "/layers" (2 members)> layers/counts <HDF5 group "/layers/counts" (3 members)> layers/counts/data <HDF5 dataset "data": shape (21301761,), type "<f4"> layers/counts/indices <HDF5 dataset "indices": shape (21301761,), type "<i4"> layers/counts/indptr <HDF5 dataset "indptr": shape (6643,), type "<i4"> layers/scvi_normalized <HDF5 dataset "scvi_normalized": shape (6642, 18408), type "<f4"> obs <HDF5 group "/obs" (10 members)> obs/_index <HDF5 dataset "_index": shape (6642,), type "|O"> obs/_scvi_batch <HDF5 dataset "_scvi_batch": shape (6642,), type "|i1"> obs/_scvi_labels <HDF5 dataset "_scvi_labels": shape (6642,), type "|i1"> obs/batch <HDF5 group "/obs/batch" (2 members)> obs/batch/categories <HDF5 dataset "categories": shape (2,), type "|O"> obs/batch/codes <HDF5 dataset "codes": shape (6642,), type "|i1"> obs/n_counts <HDF5 dataset "n_counts": shape (6642,), type "<f4"> obs/n_genes <HDF5 dataset "n_genes": shape (6642,), type "<i8"> obs/n_genes_by_counts <HDF5 dataset "n_genes_by_counts": shape (6642,), type "<i4"> obs/pct_counts_mt <HDF5 dataset "pct_counts_mt": shape (6642,), type "<f4"> obs/total_counts <HDF5 dataset "total_counts": shape (6642,), type "<f4"> obs/total_counts_mt <HDF5 dataset "total_counts_mt": shape (6642,), type "<f4"> obsm <HDF5 group "/obsm" (1 members)> obsm/X_scVI <HDF5 dataset "X_scVI": shape (6642, 10), type "<f4"> obsp <HDF5 group "/obsp" (0 members)> raw <HDF5 group "/raw" (3 members)> raw/X <HDF5 group "/raw/X" (3 members)> raw/X/data <HDF5 dataset "data": shape (21301761,), type "<f4"> raw/X/indices <HDF5 dataset "indices": shape (21301761,), type "<i4"> raw/X/indptr <HDF5 dataset "indptr": shape (6643,), type "<i4"> raw/var <HDF5 group "/raw/var" (8 members)> raw/var/_index <HDF5 dataset "_index": shape (18408,), type "|O"> raw/var/gene_ids <HDF5 dataset "gene_ids": shape (18408,), type "|O"> raw/var/mean_counts <HDF5 dataset "mean_counts": shape (18408,), type "<f4"> raw/var/mt <HDF5 dataset "mt": shape (18408,), type "|b1"> raw/var/n_cells <HDF5 dataset "n_cells": shape (18408,), type "<i8"> raw/var/n_cells_by_counts <HDF5 dataset "n_cells_by_counts": shape (18408,), type "<i8"> raw/var/pct_dropout_by_counts <HDF5 dataset "pct_dropout_by_counts": shape (18408,), type "<f8"> raw/var/total_counts <HDF5 dataset "total_counts": shape (18408,), type "<f4"> raw/varm <HDF5 group "/raw/varm" (0 members)> uns <HDF5 group "/uns" (4 members)> uns/_scvi_manager_uuid <HDF5 dataset "_scvi_manager_uuid": shape (), type "|O"> uns/_scvi_uuid <HDF5 dataset "_scvi_uuid": shape (), type "|O"> uns/hvg <HDF5 group "/uns/hvg" (1 members)> uns/hvg/flavor <HDF5 dataset "flavor": shape (), type "|O"> uns/log1p <HDF5 group "/uns/log1p" (0 members)> var <HDF5 group "/var" (13 members)> var/_index <HDF5 dataset "_index": shape (18408,), type "|O"> var/gene_ids <HDF5 dataset "gene_ids": shape (18408,), type "|O"> var/highly_variable <HDF5 dataset "highly_variable": shape (18408,), type "|b1"> var/highly_variable_rank <HDF5 dataset "highly_variable_rank": shape (18408,), type "<f4"> var/mean_counts <HDF5 dataset "mean_counts": shape (18408,), type "<f4"> var/means <HDF5 dataset "means": shape (18408,), type "<f8"> var/mt <HDF5 dataset "mt": shape (18408,), type "|b1"> var/n_cells <HDF5 dataset "n_cells": shape (18408,), type "<i8"> var/n_cells_by_counts <HDF5 dataset "n_cells_by_counts": shape (18408,), type "<i8"> var/pct_dropout_by_counts <HDF5 dataset "pct_dropout_by_counts": shape (18408,), type "<f8"> var/total_counts <HDF5 dataset "total_counts": shape (18408,), type "<f4"> var/variances <HDF5 dataset "variances": shape (18408,), type "<f8"> var/variances_norm <HDF5 dataset "variances_norm": shape (18408,), type "<f8"> varm <HDF5 group "/varm" (0 members)> varp <HDF5 group "/varp" (0 members)>
Everything actually looks fine to me; the length of indptr
is not a problem here, it's meant to be +1 on the number of cells.
The only thing that I can think of is the presence of H5AD's new factor encodings in obs/batch
. We should have been able to handle it, but perhaps you can make a copy of the file without obs/batch
and see if it works.
Thank you for your prompt reply.
I make a new file without scvi tools. And my date does work in v3 kana. I deeply appreciate your help.
kana is a really amazing tool!. This is an epoch-making.
I shouldn't have closed this issue. Anyways I ran scvi-tools on one of the test datasets and i cannot reproduce this issue. @haratoru is it possible for you to share what obs/batch
looks like in this file? That can help us debug this issue.
Thank you, jkanche.
I made a misunderstanding about my data. scbi-tool is not a reason of the error.
I used another data in kana v3 yesterday, and my date does not work in v3 kana. The data is public one, so it's not problem to share.
The following are details of obs/batch (adata.obs['batch'])
AAACCCAGTAGAGGAA-1-SSRRxxxxx1 SSRRxxxxxx1
AAACCCATCATCGCAA-1-SSRRxxxxx1 SSRRxxxxxx1
...
TTTGTTGCACGCAAAG-1-SSRRxxxxx2 SSRRxxxxxx2
TTTGTTGTCCCGAGAC-1-SSRRxxxxx2 SSRRxxxxxx2
Name: batch, Length: xxxx, dtype: category
Categories (2, object): ['SSRRxxxxx1, 'SSRRxxxx21']
I have used KB python with cellranger option, and I have checked the result with scanpy.
I concatenate two data with following code. I think this is a reason of the error, but KB python's option may be not good for kana.
adata_SSRRxxxxxx2 = sc.read_10x_mtx(path='/counts_unfiltered/cellranger/', var_names='gene_symbols',cache=True) adata_SSRRxxxxxx2 = sc.read_10x_mtx(path='/counts_unfiltered/cellranger/', var_names='gene_symbols',cache=True) adata = adata_SSRRxxxxxx1.concatenate(adata_SSRRxxxxxx2, batch_categories=['SSRRxxxxxx1', 'SSRRxxxxxx2])
@haratoru since the data is public, can you give us the link to the H5AD?
Finally maybe I have solved this problem.
When I read the cellranger type data, I have changed and set the option that cache is False. And error does not happen. So I think this is a reason of error.
Before → error
adata = sc.read_10x_mtx(path='/out_SRR17375059_sra/counts_unfiltered/cellranger', var_names='gene_symbols',cache=True)
After → OK
adata = sc.read_10x_mtx(path='/out_SRR17375059_sra/counts_unfiltered/cellranger', var_names='gene_symbols',cache=False)
I sincerely appreciate your time and consideration.
This is a code I have made the H5AD data. I am not good at programming, so there may be many bugs.
Thanks for great tool, but kana v3 does not work to my data.
kana v3 stops in this error.
Error massage :PREFLIGHT INPUT_DATA
0:09:47: (experimenthub) store initialized 0:09:47: (kanadb) store initialized 0:09:47: (downloadsdb) store initialized 0:09:51: analysis state created 0:09:51: bakana initialized 0:10:26: preflight_input finished 0:10:26: Error: expected all arrays in 'columns' to have equal length
kana v2 works, but kana v3 does not. I do not understand this error, so I do not know what to do.