pachterlab / kb_python

A wrapper for the kallisto | bustools workflow for single-cell RNA-seq pre-processing
https://www.kallistobus.tools/
BSD 2-Clause "Simplified" License
141 stars 24 forks source link

UMI counts of spliced and unspliced don't match with counts in Cell Ranger output. #158

Closed akhst7 closed 2 years ago

akhst7 commented 2 years ago

Describe the issue I run kb-python to get spliced and unspliced counts on our mouse data. Kb run to the completion and generate adata.h5ad. A filtered adata.h5ad was imported in R by using the anndata pacakge and checked its dims;

> dim(FTAP.ann)
[1] 358649  55414

Now, dims of the original data is ;

> dim(FTAPnew$counts)
[1] 31053 14674

For some reasons, the UMI count after a kb step, grew from 14674 to 358649. I double checked and made sure that the correct fasqs were used.

Could you suggest what I am missing here ?

Thanks.

What is the exact command that was run?

kb ref -i index.idx -g t2g.txt -f1 cdna.fa -f2 intron.fa -c1 cdna_t2c.txt -c2 intron_t2c.txt --workflow lamanno Mus_musculus.GRCm39.dna.primary_assembly.fa Mus_musculus.GRCm39.105.gtf

kb count --h5ad -i index.idx -g t2g.txt -x 10xv3 -o FTAP_velocity -c1 cdna_t2c.txt -c2 intron_t2c.txt --workflow lamanno --filter bustools -t 40 -m 80 11-19-19_1_S5_L001_R1_001.fastq.gz 11-19-19_1_S5_L001_R2_001.fastq.gz 11-19-19_1_S5_L002_R1_001.fastq.gz 11-19-19_1_S5_L002_R2_001.fastq.gz

Command output (with --verbose flag) see above