kanaverse / kana

Single cell analysis in the browser
https://kanaverse.org/kana/
MIT License
142 stars 12 forks source link

My H5AD data work in kana v2, but not in kana v3. #201

Closed haratoru closed 1 year ago

haratoru commented 1 year ago

Thanks for great tool, but kana v3 does not work to my data.

kana v3 stops in this error.

Error massage :PREFLIGHT INPUT_DATA

0:09:47: (experimenthub) store initialized 0:09:47: (kanadb) store initialized 0:09:47: (downloadsdb) store initialized 0:09:51: analysis state created 0:09:51: bakana initialized 0:10:26: preflight_input finished 0:10:26: Error: expected all arrays in 'columns' to have equal length

kana v2 works, but kana v3 does not. I do not understand this error, so I do not know what to do.

jkanche commented 1 year ago

in either cell or feature annotations, one of the columns is an incorrect length. v3 captures these inconsistencies (since we use bioc dataframes. can you check what these looks like in Python?

LTLA commented 1 year ago

As @jkanche says, it would be helpful to see what the structure of your H5AD file is. If you're using R, you could report:

rhdf5::h5ls("my_file.h5ad")

The Python equivalent would be something like:

import h5py
handle = h5py.File("my_file.h5ad")
handle.visititems(print)
haratoru commented 1 year ago

Thanks for your comment, I really appreciate your help. I have checked my data. Now I think "var/gene_ids" and/or "indptr": shape (6643,) would make the error. I am change those part, and I will try again.


The followini is the structure of my H5AD file.

X <HDF5 group "/X" (3 members)> X/data <HDF5 dataset "data": shape (21301761,), type "<f4"> X/indices <HDF5 dataset "indices": shape (21301761,), type "<i4"> X/indptr <HDF5 dataset "indptr": shape (6643,), type "<i4"> layers <HDF5 group "/layers" (2 members)> layers/counts <HDF5 group "/layers/counts" (3 members)> layers/counts/data <HDF5 dataset "data": shape (21301761,), type "<f4"> layers/counts/indices <HDF5 dataset "indices": shape (21301761,), type "<i4"> layers/counts/indptr <HDF5 dataset "indptr": shape (6643,), type "<i4"> layers/scvi_normalized <HDF5 dataset "scvi_normalized": shape (6642, 18408), type "<f4"> obs <HDF5 group "/obs" (10 members)> obs/_index <HDF5 dataset "_index": shape (6642,), type "|O"> obs/_scvi_batch <HDF5 dataset "_scvi_batch": shape (6642,), type "|i1"> obs/_scvi_labels <HDF5 dataset "_scvi_labels": shape (6642,), type "|i1"> obs/batch <HDF5 group "/obs/batch" (2 members)> obs/batch/categories <HDF5 dataset "categories": shape (2,), type "|O"> obs/batch/codes <HDF5 dataset "codes": shape (6642,), type "|i1"> obs/n_counts <HDF5 dataset "n_counts": shape (6642,), type "<f4"> obs/n_genes <HDF5 dataset "n_genes": shape (6642,), type "<i8"> obs/n_genes_by_counts <HDF5 dataset "n_genes_by_counts": shape (6642,), type "<i4"> obs/pct_counts_mt <HDF5 dataset "pct_counts_mt": shape (6642,), type "<f4"> obs/total_counts <HDF5 dataset "total_counts": shape (6642,), type "<f4"> obs/total_counts_mt <HDF5 dataset "total_counts_mt": shape (6642,), type "<f4"> obsm <HDF5 group "/obsm" (1 members)> obsm/X_scVI <HDF5 dataset "X_scVI": shape (6642, 10), type "<f4"> obsp <HDF5 group "/obsp" (0 members)> raw <HDF5 group "/raw" (3 members)> raw/X <HDF5 group "/raw/X" (3 members)> raw/X/data <HDF5 dataset "data": shape (21301761,), type "<f4"> raw/X/indices <HDF5 dataset "indices": shape (21301761,), type "<i4"> raw/X/indptr <HDF5 dataset "indptr": shape (6643,), type "<i4"> raw/var <HDF5 group "/raw/var" (8 members)> raw/var/_index <HDF5 dataset "_index": shape (18408,), type "|O"> raw/var/gene_ids <HDF5 dataset "gene_ids": shape (18408,), type "|O"> raw/var/mean_counts <HDF5 dataset "mean_counts": shape (18408,), type "<f4"> raw/var/mt <HDF5 dataset "mt": shape (18408,), type "|b1"> raw/var/n_cells <HDF5 dataset "n_cells": shape (18408,), type "<i8"> raw/var/n_cells_by_counts <HDF5 dataset "n_cells_by_counts": shape (18408,), type "<i8"> raw/var/pct_dropout_by_counts <HDF5 dataset "pct_dropout_by_counts": shape (18408,), type "<f8"> raw/var/total_counts <HDF5 dataset "total_counts": shape (18408,), type "<f4"> raw/varm <HDF5 group "/raw/varm" (0 members)> uns <HDF5 group "/uns" (4 members)> uns/_scvi_manager_uuid <HDF5 dataset "_scvi_manager_uuid": shape (), type "|O"> uns/_scvi_uuid <HDF5 dataset "_scvi_uuid": shape (), type "|O"> uns/hvg <HDF5 group "/uns/hvg" (1 members)> uns/hvg/flavor <HDF5 dataset "flavor": shape (), type "|O"> uns/log1p <HDF5 group "/uns/log1p" (0 members)> var <HDF5 group "/var" (13 members)> var/_index <HDF5 dataset "_index": shape (18408,), type "|O"> var/gene_ids <HDF5 dataset "gene_ids": shape (18408,), type "|O"> var/highly_variable <HDF5 dataset "highly_variable": shape (18408,), type "|b1"> var/highly_variable_rank <HDF5 dataset "highly_variable_rank": shape (18408,), type "<f4"> var/mean_counts <HDF5 dataset "mean_counts": shape (18408,), type "<f4"> var/means <HDF5 dataset "means": shape (18408,), type "<f8"> var/mt <HDF5 dataset "mt": shape (18408,), type "|b1"> var/n_cells <HDF5 dataset "n_cells": shape (18408,), type "<i8"> var/n_cells_by_counts <HDF5 dataset "n_cells_by_counts": shape (18408,), type "<i8"> var/pct_dropout_by_counts <HDF5 dataset "pct_dropout_by_counts": shape (18408,), type "<f8"> var/total_counts <HDF5 dataset "total_counts": shape (18408,), type "<f4"> var/variances <HDF5 dataset "variances": shape (18408,), type "<f8"> var/variances_norm <HDF5 dataset "variances_norm": shape (18408,), type "<f8"> varm <HDF5 group "/varm" (0 members)> varp <HDF5 group "/varp" (0 members)>

LTLA commented 1 year ago

Everything actually looks fine to me; the length of indptr is not a problem here, it's meant to be +1 on the number of cells.

The only thing that I can think of is the presence of H5AD's new factor encodings in obs/batch. We should have been able to handle it, but perhaps you can make a copy of the file without obs/batch and see if it works.

haratoru commented 1 year ago

Thank you for your prompt reply.

I make a new file without scvi tools. And my date does work in v3 kana. I deeply appreciate your help.

kana is a really amazing tool!. This is an epoch-making.

jkanche commented 1 year ago

I shouldn't have closed this issue. Anyways I ran scvi-tools on one of the test datasets and i cannot reproduce this issue. @haratoru is it possible for you to share what obs/batch looks like in this file? That can help us debug this issue.

haratoru commented 1 year ago

Thank you, jkanche.

I made a misunderstanding about my data. scbi-tool is not a reason of the error.

I used another data in kana v3 yesterday, and my date does not work in v3 kana. The data is public one, so it's not problem to share.


The following are details of obs/batch (adata.obs['batch'])

AAACCCAGTAGAGGAA-1-SSRRxxxxx1 SSRRxxxxxx1 AAACCCATCATCGCAA-1-SSRRxxxxx1 SSRRxxxxxx1 ...
TTTGTTGCACGCAAAG-1-SSRRxxxxx2 SSRRxxxxxx2 TTTGTTGTCCCGAGAC-1-SSRRxxxxx2 SSRRxxxxxx2 Name: batch, Length: xxxx, dtype: category Categories (2, object): ['SSRRxxxxx1, 'SSRRxxxx21']


I have used KB python with cellranger option, and I have checked the result with scanpy.

I concatenate two data with following code. I think this is a reason of the error, but KB python's option may be not good for kana.

adata_SSRRxxxxxx2 = sc.read_10x_mtx(path='/counts_unfiltered/cellranger/', var_names='gene_symbols',cache=True) adata_SSRRxxxxxx2 = sc.read_10x_mtx(path='/counts_unfiltered/cellranger/', var_names='gene_symbols',cache=True) adata = adata_SSRRxxxxxx1.concatenate(adata_SSRRxxxxxx2, batch_categories=['SSRRxxxxxx1', 'SSRRxxxxxx2])

jkanche commented 1 year ago

@haratoru since the data is public, can you give us the link to the H5AD?

haratoru commented 1 year ago

Finally maybe I have solved this problem.

When I read the cellranger type data, I have changed and set the option that cache is False. And error does not happen. So I think this is a reason of error.

Before → error

adata = sc.read_10x_mtx(path='/out_SRR17375059_sra/counts_unfiltered/cellranger', var_names='gene_symbols',cache=True)

After → OK

adata = sc.read_10x_mtx(path='/out_SRR17375059_sra/counts_unfiltered/cellranger', var_names='gene_symbols',cache=False)

I sincerely appreciate your time and consideration.

haratoru commented 1 year ago

KB-Copy1.zip

This is a code I have made the H5AD data. I am not good at programming, so there may be many bugs.