Question on publicly available reprocessed datasets

theislab / scib

Benchmarking analysis of data integration tools

MIT License

283 stars 62 forks source link

Hello there,

I'm interested in using the reprocessed benchmark datasets that were used for scIB's paper. I have found the .h5ad files in the following link that was provided in the paper; https://figshare.com/articles/dataset/Benchmarking_atlas-level_data_integration_in_single-cellgenomics-_integration_task_datasets_Immune_andpancreas/12420968

I have read the pancreas .h5ad using scanpy into an anndata object. I have noticed that the count matrix (anndata.X) is different to the counts matrix that is stored in the layers['counts'] variable (I'm referring to anndata.layers['counts']).

What is the difference between the 2? Is one the raw counts matrix and the other a processed version of it (scaled and logarithmized?).

Thanks in advance!!

theislab / scib

Question on publicly available reprocessed datasets #378