I'm working on analyzing scRNA-seq data using scvelo. I'm trying to merge two loom files (New_Alloxan_possorted_genome_bam_CLF5P.loom and THR_possorted_genome_bam_SI7SF.loom) containing gene expression data for different cell populations.
Here's the relevant part of my code:
Python
import scanpy as sc
import scvelo as scv
import loompy
barcodes1 = [bc.split(':')[1] for bc in ldata1.obs.index.tolist()]
barcodes1 = [bc[0:len(bc)-1] + '_01' for bc in barcodes1]
ldata1.obs.index = barcodes1
barcodes2 = [bc.split(':')[1] for bc in ldata2.obs.index.tolist()]
barcodes2 = [bc[0:len(bc)-1] + '_02' for bc in barcodes2]
ldata2.obs.index = barcodes2
print(adata.shape) # Output: (15515, 32247)
print(ldata.shape) # Output: (19310, 32247)
Use code with caution.
content_copy
Problem:
I expected the merged adata object to have around 19,000 cells (approximately the sum of cells in ldata1 and ldata2). However, after merging using scv.utils.merge, the number of cells in adata is significantly reduced to only 900.
Questions:
Is there a potential issue with how I'm aligning the barcodes between adata and ldata before merging?
Could there be another reason for the unexpected cell count reduction after merging?
How can I troubleshoot this issue to ensure all the cells from ldata2 are correctly included in the merged adata object?
Additional Information:
I've included the output showing the first few barcodes from adata, ldata1, and ldata2 after processing.
First few adata barcodes:
Index(['AAACGAACACGT', 'AAACGCTGTCCG', 'AAACGCTTCGTC', 'AAAGAACAGAGC',
'AAAGGGCCACGG'], dtype='object')
First few ldata1 barcodes:
Index(['AAAGAACAGAGCCATG_01', 'AATCACGGTTAACAGA_01', 'AACGAAAGTCTGCATA_01',
'AACAGGGCAGGATGAC_01', 'AATCGTGCAGCACAGA_01'], dtype='object')
First few ldata2 barcodes:
Index(['AAAGAACCAAGCTCTA_02', 'AAACCCACAGTGTACT_02', 'AAACCCACAACCGCCA_02',
'AAACGCTAGTGGTTCT_02', 'AAACGCTTCTGGCCGA_02'], dtype='object')
Any insights or suggestions on how to resolve this issue would be greatly appreciated!
I'm working on analyzing scRNA-seq data using scvelo. I'm trying to merge two loom files (New_Alloxan_possorted_genome_bam_CLF5P.loom and THR_possorted_genome_bam_SI7SF.loom) containing gene expression data for different cell populations.
Here's the relevant part of my code:
Python import scanpy as sc import scvelo as scv import loompy
Load loom data
ldata1 = scv.read('New_Alloxan_possorted_genome_bam_CLF5P.loom', cache=True) ldata2 = scv.read('THR_possorted_genome_bam_SI7SF.loom', cache=True)
Rename barcodes to ensure uniqueness
barcodes1 = [bc.split(':')[1] for bc in ldata1.obs.index.tolist()] barcodes1 = [bc[0:len(bc)-1] + '_01' for bc in barcodes1] ldata1.obs.index = barcodes1
barcodes2 = [bc.split(':')[1] for bc in ldata2.obs.index.tolist()] barcodes2 = [bc[0:len(bc)-1] + '_02' for bc in barcodes2] ldata2.obs.index = barcodes2
Make variable names unique
ldata1.var_names_make_unique() ldata2.var_names_make_unique()
Concatenate ldata1 and ldata2
ldata = ldata1.concatenate(ldata2)
Align variables (features) between adata and ldata_combined
common_genes = adata.var_names.intersection(ldata.var_names) adata = adata[:, common_genes] ldata = ldata[:, common_genes]
Merge matrices
adata = scv.utils.merge(adata, ldata)
Print shapes to verify
print(adata.shape) # Output: (15515, 32247) print(ldata.shape) # Output: (19310, 32247) Use code with caution. content_copy Problem:
I expected the merged adata object to have around 19,000 cells (approximately the sum of cells in ldata1 and ldata2). However, after merging using scv.utils.merge, the number of cells in adata is significantly reduced to only 900.
Questions:
Is there a potential issue with how I'm aligning the barcodes between adata and ldata before merging? Could there be another reason for the unexpected cell count reduction after merging? How can I troubleshoot this issue to ensure all the cells from ldata2 are correctly included in the merged adata object? Additional Information:
I've included the output showing the first few barcodes from adata, ldata1, and ldata2 after processing.
First few adata barcodes: Index(['AAACGAACACGT', 'AAACGCTGTCCG', 'AAACGCTTCGTC', 'AAAGAACAGAGC', 'AAAGGGCCACGG'], dtype='object') First few ldata1 barcodes: Index(['AAAGAACAGAGCCATG_01', 'AATCACGGTTAACAGA_01', 'AACGAAAGTCTGCATA_01', 'AACAGGGCAGGATGAC_01', 'AATCGTGCAGCACAGA_01'], dtype='object') First few ldata2 barcodes: Index(['AAAGAACCAAGCTCTA_02', 'AAACCCACAGTGTACT_02', 'AAACCCACAACCGCCA_02', 'AAACGCTAGTGGTTCT_02', 'AAACGCTTCTGGCCGA_02'], dtype='object') Any insights or suggestions on how to resolve this issue would be greatly appreciated!