pachterlab / kb_python

A wrapper for the kallisto | bustools workflow for single-cell RNA-seq pre-processing
https://www.kallistobus.tools/
BSD 2-Clause "Simplified" License
154 stars 23 forks source link

Whitelist for SMARTSEQ2 technology #254

Closed dawe closed 6 months ago

dawe commented 6 months ago

Hello, I am processing single cells obtained by SMARTSEQ. I have created a 3-columns tsv file with cell-id and the two read pairs (samples). I launched kb count like

kb count -t 8 -o $OUTDIR --workflow standard -i ${D}/index.idx  -g ${D}/t2g.txt   --parity paired -x SMARTSEQ2 --h5ad --gene-names  samples

I get the h5ad file and it contains less than expected cells, this because filtering has been applied (next time I will apply 0-threshold). However, my issue is that cell names in the AnnData are oligonucleotides, whereas the first column in the samples file contains sample names. I assume the cell order in AnnData is the same specified in the samples file, however it's not clear how to identify sample correspondences and which cells have been filtered.

Yenaled commented 6 months ago

Just look at the unfiltered results to match oligonucleotides to sample names — that’s the easiest way to go. kb count outputs unfiltered results (e.g. there will be a cells_x_genes.barcodes.txt file in your unfiltered results that contains your unfiltered oligonucleotides that you can match to your sample names).

dawe commented 6 months ago

Excellent, thanks

dawe commented 6 months ago

Hello @Yenaled , sorry for reopening the issue. I have checked and

$ wc -l samples 
28 samples
$ wc -l $OUTDIR/counts_unfiltered/cells_x_genes.barcodes.txt  
21 RUN799_783/counts_unfiltered/cells_x_genes.barcodes.txt

so 7 cells are missing. I ran kb count with threshold set to 0, I found a whitelist.txt file which also doesn't match

$ wc -l $OUTDIR/whitelist.txt
29 RUN799_783/whitelist.txt

with an extra cell, matrix.cells file contains 28 cells (with the sample name in the original samples file)

$ wc -l $OUTDIR/matrix.cells
28 RUN799_783/matrix.cells
dawe commented 6 months ago

The file matrix.sample.barcodes should be the appropriate one. Nevertheless, how can I disable bustools filtering?

Yenaled commented 6 months ago

^ah yes, that is correct.

bustools doesn’t do any filtering — what might be happening is that some of your “cells” are having 0 reads being mapped in which case they can’t appear in the final matrix.

dawe commented 6 months ago

I've solved specyfying a fixed barcode whitelist like

AAAAAAAAAAAAAAAT
AAAAAAAAAAAAACCT
AAAAAAAAAAAAAAAA
AAAAAAAAAAAAACGA
AAAAAAAAAAAAACGG
AAAAAAAAAAAAAAAC
AAAAAAAAAAAAAAAT
AAAAAAAAAAAAAAGA
AAAAAAAAAAAAAAGA
AAAAAAAAAAAAACAT
…

That is generated by kb itself