Open barbareyex opened 1 year ago
Could you please provide a minimal working example of this bug? Can you subsample your AnnData objects and maybe create an artificial recreation of it here, please?
This is the anndata working properly:
AnnData object with n_obs × n_vars = 24759 × 29612
obs: 'sample', 'batch', 'n_counts'
var: 'ensembl_id', 'n_cells'
layers: 'counts'
After normalization and logarithmize it with:
adata.X = sc.pp.normalize_total(adata, inplace=False)['X']
adata.X = sc.pp.log1p(adata.X)
And computing PCAs, neighbors and UMAP coordinates this is a plot showing the expression of GRIK1 f.e:
Then, this is the anndata that after normalization does not show gene expression in UMAP:
AnnData object with n_obs × n_vars = 17217 × 33704
obs: 'Age', 'Condition', 'Origin', 'Region', 'Sex', 'Subject', 'louvain', 'louvain6', 'obs_names', 'sample', 'batch', 'dataset'
var: 'dispersions', 'dispersions_norm', 'gene_ids', 'highly_variable', 'means', 'n_cells', 'var_names'
obsm: 'X_umap'
layers: 'counts'
Because this anndata has pre-computed UMAP coordinates and the raw data was normalized with sizefactors in R, when reading the file, adata.X is already normalized, and if I plot the UMAP for SLC5A11 f.e this is the result:
However, if I select the raw counts of this anndata (stored in layers['counts']) and normalize it with sc.pp.normalize
function and logarithmize it, this is the output of sc.pl.umap
(it doesn't matter re-computing PCAs, neighbors and UMAP):
UMAP after recomputing PCAs, etc:
Please make sure these conditions are met
What happened?
Hi,
I have two different datasets, both with raw counts. After using the function
sc.pp.normalize_total
andsc.pp.log1p
and plotting the data with UMAP coordinates, there is no gene expression in cells coming from one of the datasets. I did the analysis separately (without concatenating) and the same happens.I thought that maybe is a problem with the data type, but when I checked this, both anndatas.X were np.float32 and sparse.csr_matrixes ( #1612 ). Also, I made sure the anndata matrix related to the problematic dataset has acceptable values and they are not zeros.
Any idea about this problem?
Minimal code sample
Error output
No response
Versions