pachterlab / kb_python

A wrapper for the kallisto | bustools workflow for single-cell RNA-seq pre-processing
https://www.kallistobus.tools/
BSD 2-Clause "Simplified" License
141 stars 24 forks source link

Kb-python workflow differences (nucleus vs lamanno) resulting in different count numbers #178

Closed x1han closed 1 year ago

x1han commented 1 year ago

hi, kb is an amazing tool, but I got a problem when I analyzed my single nucleus data for count and velocity information. I followed the suggestions post in #171 and got 2 adata(adata for adata with workflow=nucleus, adata_v for workflow=lamanno). Then I checked the count matrix in these 2 adata and found a difference.

from collections import Counter

Counter(adata_v.to_df().index == adata.to_df().index) Counter({True: 8885})

Counter(adata_v.to_df().loc[:, 'PDGFRA'] == adata.to_df().loc[:, 'PDGFRA']) Counter({True: 7781, False: 1104})

I am wondering if this is why as Yenaled said:

Keep in mind that there are additional challenges when doing RNA velocity for single nucleus RNAseq (e.g. factoring in nuclear export) that haven't really all been figured out.

And can anyone give some advice if I want to analyze my data in both traditional pipelines (e.g. PCA, dimension reduction, clustering, differential expression, and others) and RNA velocity analyze, which adata should I use?

Thanks.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days