Closed kostaslazaros closed 1 year ago
Copied from the e-mail:
Hi @kostaslazaros,
You can check the reproducibility notebooks for each dataset to see what was done to generate the fighsare data. These are at github.com/theislab/scib-reproducibility. It should also be explained in the methods section of the paper. In short, the .layers['counts']
data is count data, or as close to count data that we can generate from full-length protocols (e.g., CEL-seq). Full length data usually store TPMs there. In adata.X you will find the log-normalized data. Normalization is done via scran and then scanpy's log1p function is used.
Hello there,
I'm interested in using the reprocessed benchmark datasets that were used for scIB's paper. I have found the .h5ad files in the following link that was provided in the paper; https://figshare.com/articles/dataset/Benchmarking_atlas-level_data_integration_in_single-cellgenomics-_integration_task_datasets_Immune_andpancreas/12420968
I have read the pancreas .h5ad using scanpy into an anndata object. I have noticed that the count matrix (anndata.X) is different to the counts matrix that is stored in the layers['counts'] variable (I'm referring to anndata.layers['counts']).
What is the difference between the 2? Is one the raw counts matrix and the other a processed version of it (scaled and logarithmized?).
Thanks in advance!!