theislab / scvelo

RNA Velocity generalized through dynamical modeling
https://scvelo.org
BSD 3-Clause "New" or "Revised" License
418 stars 102 forks source link

Significant cell count reduction after merging loom files with scvelo.merge #1248

Closed mayurdoke6 closed 5 months ago

mayurdoke6 commented 5 months ago

I'm working on analyzing scRNA-seq data using scvelo. I'm trying to merge two loom files (New_Alloxan_possorted_genome_bam_CLF5P.loom and THR_possorted_genome_bam_SI7SF.loom) containing gene expression data for different cell populations.

Here's the relevant part of my code:

Python import scanpy as sc import scvelo as scv import loompy

Load loom data

ldata1 = scv.read('New_Alloxan_possorted_genome_bam_CLF5P.loom', cache=True) ldata2 = scv.read('THR_possorted_genome_bam_SI7SF.loom', cache=True)

Rename barcodes to ensure uniqueness

barcodes1 = [bc.split(':')[1] for bc in ldata1.obs.index.tolist()] barcodes1 = [bc[0:len(bc)-1] + '_01' for bc in barcodes1] ldata1.obs.index = barcodes1

barcodes2 = [bc.split(':')[1] for bc in ldata2.obs.index.tolist()] barcodes2 = [bc[0:len(bc)-1] + '_02' for bc in barcodes2] ldata2.obs.index = barcodes2

Make variable names unique

ldata1.var_names_make_unique() ldata2.var_names_make_unique()

Concatenate ldata1 and ldata2

ldata = ldata1.concatenate(ldata2)

Align variables (features) between adata and ldata_combined

common_genes = adata.var_names.intersection(ldata.var_names) adata = adata[:, common_genes] ldata = ldata[:, common_genes]

Merge matrices

adata = scv.utils.merge(adata, ldata)

Print shapes to verify

print(adata.shape) # Output: (15515, 32247) print(ldata.shape) # Output: (19310, 32247) Use code with caution. content_copy Problem:

I expected the merged adata object to have around 19,000 cells (approximately the sum of cells in ldata1 and ldata2). However, after merging using scv.utils.merge, the number of cells in adata is significantly reduced to only 900.

Questions:

Is there a potential issue with how I'm aligning the barcodes between adata and ldata before merging? Could there be another reason for the unexpected cell count reduction after merging? How can I troubleshoot this issue to ensure all the cells from ldata2 are correctly included in the merged adata object? Additional Information:

I've included the output showing the first few barcodes from adata, ldata1, and ldata2 after processing.

First few adata barcodes: Index(['AAACGAACACGT', 'AAACGCTGTCCG', 'AAACGCTTCGTC', 'AAAGAACAGAGC', 'AAAGGGCCACGG'], dtype='object') First few ldata1 barcodes: Index(['AAAGAACAGAGCCATG_01', 'AATCACGGTTAACAGA_01', 'AACGAAAGTCTGCATA_01', 'AACAGGGCAGGATGAC_01', 'AATCGTGCAGCACAGA_01'], dtype='object') First few ldata2 barcodes: Index(['AAAGAACCAAGCTCTA_02', 'AAACCCACAGTGTACT_02', 'AAACCCACAACCGCCA_02', 'AAACGCTAGTGGTTCT_02', 'AAACGCTTCTGGCCGA_02'], dtype='object') Any insights or suggestions on how to resolve this issue would be greatly appreciated!

WeilerP commented 5 months ago

See #1249.