pachterlab / kb_python

A wrapper for the kallisto | bustools workflow for single-cell RNA-seq pre-processing
https://www.kallistobus.tools/
BSD 2-Clause "Simplified" License
141 stars 24 forks source link

Why is there a big difference between kallisto and cellranger count result? #192

Closed aizhimin closed 1 year ago

aizhimin commented 1 year ago

Why is there a big difference between kallisto and cellranger count result?

kb count -i /mnt/data/ref/transcriptome.idx -g /mnt/data/ref/transcripts_to_genes.txt -t 8 -m 16G --mm --h5ad --filter bustools --cellranger --gene-names --verbose --overwrite -x 10XV2 -o output/SRR8315735 SRR8315735_1.fastq.gz SRR8315735_2.fastq.gz

kallisto: image

cellranger: image

Yenaled commented 1 year ago

Have you looked at the unfiltered results from kallisto?

aizhimin commented 1 year ago

Have you looked at the unfiltered results from kallisto?

@Yenaled The filtering criteria for both are the same.

aizhimin commented 1 year ago

@Yenaled this result is unfiltered 。

image

aizhimin commented 1 year ago

@Yenaled The results are come from this same sample: https://www.ncbi.nlm.nih.gov/sra/SRR8315735

Yenaled commented 1 year ago

Is that the result from the unfiltered_results folder? (Aka not the folder that is created by the --filter=bustools).

Can you provide more info? -- how the index was created (the command used and where you got the files). What's in the run_info.json file? What is the pseudoalignment rate? What is the path to the matrix/h5ad file you are loading in? What do the first few lines of the matrix files look like? What does the knee plot look like?

With 25 million reads, hard to imagine that the average cell only gets a count of 1000 even if only 2/3rds of those reads map unless some post-processing was done.

aizhimin commented 1 year ago

@Yenaled run_info.json : { "n_targets": 0, "n_bootstraps": 0, "n_processed": 25005657, "n_pseudoaligned": 1955219, "n_unique": 1095764, "p_pseudoaligned": 7.8, "p_unique": 4.4, "kallisto_version": "0.48.0", "index_version": 0, "start_time": "Fri Feb 17 09:45:50 2023", "call": "/root/miniconda3/envs/jupyter39/lib/python3.9/site-packages/kb_python/bins/compiled/kallisto/kallisto bus -i /mnt/data/ref/transcriptome.idx -o output/v2 -x 10X V2 -t 8 SRR8315735_S1_L001_R1_001.fastq.gz SRR8315735_S1_L001_R2_001.fastq.gz" }

Loading in the path: counts_ unfiltered.

The matrix file content:

image

You mean,can‘t add the --filter=bustools ?

aizhimin commented 1 year ago

@Yenaled Is it possible to lack the correct 5' barcode whitelist as the input for kallisto ?

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days