theislab / scvelo

RNA Velocity generalized through dynamical modeling
https://scvelo.org
BSD 3-Clause "New" or "Revised" License
400 stars 103 forks source link

Significant cell count reduction after merging loom files with scvelo.merge #1249

Closed mayurdoke6 closed 1 month ago

mayurdoke6 commented 1 month ago

I'm working on analyzing scRNA-seq data using scvelo. I'm trying to merge two loom files (New_Alloxan_possorted_genome_bam_CLF5P.loom and THR_possorted_genome_bam_SI7SF.loom) containing gene expression data for different cell populations.

Here's the relevant part of my code:

Python import scanpy as sc import scvelo as scv import loompy

Load loom data ldata1 = scv.read('New_Alloxan_possorted_genome_bam_CLF5P.loom', cache=True) ldata2 = scv.read('THR_possorted_genome_bam_SI7SF.loom', cache=True)

Rename barcodes to ensure uniqueness barcodes1 = [bc.split(':')[1] for bc in ldata1.obs.index.tolist()] barcodes1 = [bc[0:len(bc)-1] + '_01' for bc in barcodes1] ldata1.obs.index = barcodes1

barcodes2 = [bc.split(':')[1] for bc in ldata2.obs.index.tolist()] barcodes2 = [bc[0:len(bc)-1] + '_02' for bc in barcodes2] ldata2.obs.index = barcodes2

Make variable names unique ldata1.var_names_make_unique() ldata2.var_names_make_unique()

Concatenate ldata1 and ldata2 ldata = ldata1.concatenate(ldata2)

Align variables (features) between adata and ldata_combined common_genes = adata.var_names.intersection(ldata.var_names) adata = adata[:, common_genes] ldata = ldata[:, common_genes]

Merge matrices adata = scv.utils.merge(adata, ldata)

Print shapes to verify print(adata.shape) # Output: (15515, 32247) print(ldata.shape) # Output: (19310, 32247) Use code with caution. content_copy Problem:

I expected the merged adata object to have around 19,000 cells (approximately the sum of cells in ldata1 and ldata2). However, after merging using scv.utils.merge, the number of cells in adata is significantly reduced to only 900.

Questions:

Is there a potential issue with how I'm aligning the barcodes between adata and ldata before merging? Could there be another reason for the unexpected cell count reduction after merging? How can I troubleshoot this issue to ensure all the cells from ldata2 are correctly included in the merged adata object? Additional Information:

I've included the output showing the first few barcodes from adata, ldata1, and ldata2 after processing.

First few adata barcodes: Index(['AAACGAACACGT', 'AAACGCTGTCCG', 'AAACGCTTCGTC', 'AAAGAACAGAGC', 'AAAGGGCCACGG'], dtype='object') First few ldata1 barcodes: Index(['AAAGAACAGAGCCATG_01', 'AATCACGGTTAACAGA_01', 'AACGAAAGTCTGCATA_01', 'AACAGGGCAGGATGAC_01', 'AATCGTGCAGCACAGA_01'], dtype='object') First few ldata2 barcodes: Index(['AAAGAACCAAGCTCTA_02', 'AAACCCACAGTGTACT_02', 'AAACCCACAACCGCCA_02', 'AAACGCTAGTGGTTCT_02', 'AAACGCTTCTGGCCGA_02'], dtype='object') Any insights or suggestions on how to resolve this issue would be greatly appreciated!

WeilerP commented 1 month ago

Please use anndata's merge function and check the already existing issues and discussion on.