pachterlab / GPCTP_2019

Notebooks for reproducing figures and results from the paper Gehring et al., Highly multiplexed single-cell RNA-seq by DNA oligonucleotide tagging of cellular proteins, Nature Biotechnology (2019) doi:10.1038/s41587-019-0372-z
6 stars 2 forks source link

Request for Assistance Reproducing 4-Sample Demuliplexing Example #3

Closed cwarden45 closed 2 years ago

cwarden45 commented 2 years ago

Hi,

Thank you very much for posting all of the materials associated with this paper and the methods.

In preparation to try and assist with some newly generated ClickTags data, I wanted to first make sure that I could run similar code on the example data.

However, I am getting a warning message that there are no pseudoaligned reads with the example data that you provide. Should I be concerned about this?

Here is the code that I am using to create the index:

NAME=ClickTagWhitelist

python3 featuremap.py $NAME.csv --t2g $NAME.t2g --fa $NAME.fa --header --quiet
kallisto/kallisto index -i $NAME.idx -k 15 ./$NAME.fa

And here is the code that I am using to start the sample barcode ClickTag assignments:

FQNAME=NSC_Fixed_S2
#FQNAME=NSC_Live_S1

WLNAME=ClickTagWhitelist
OUTPREFIX=DEMUX_FQ

WL=../../$WLNAME.idx
R1=$FQNAME\_L001_R1_001.fastq.gz
R2=$FQNAME\_L001_R2_001.fastq.gz
OUTFOLDER=$OUTPREFIX\__$FQNAME\__$WLNAME

mkdir $OUTFOLDER
../../kallisto/kallisto bus -i $WL -o $OUTFOLDER -x 10xv2 -t 2 $R1 $R2

As far as I can tell, I believe that is similar to the code for this study was well as the general kite code example.

Just to make sure I wasn't overlooking anything, I also tried running the following code:

FQNAME=NSC_Fixed_S2
#FQNAME=NSC_Live_S1

WLNAME=ClickTagWhitelist
OUTPREFIX=DEMUX_FQ
WL10X=10xv2_whitelist.txt

WL=../../$WLNAME.idx
R1=$FQNAME\_L001_R1_001.fastq.gz
R2=$FQNAME\_L001_R2_001.fastq.gz
OUTFOLDER=$OUTPREFIX\__$FQNAME\__$WLNAME

#mkdir $OUTFOLDER
#../../kallisto/kallisto bus -i $WL -o $OUTFOLDER -x 10xv2 -t 2 $R1 $R2
/opt/bustools/build/src/bustools correct -w $WL10X $OUTFOLDER/output.bus -o $OUTFOLDER/output_corrected.bus
/opt/bustools/build/src/bustools sort -t 2 -o $OUTFOLDER/output_sorted.bus $OUTFOLDER/output_corrected.bus
/opt/bustools/build/src/bustools count -o $OUTFOLDER/featurecounts --genecounts -g ../../$WLNAME.t2g -e $OUTFOLDER/matrix.ec -t $OUTFOLDER/transcripts.txt $OUTFOLDER/output_sorted.bus

However, because the starting .bus file was basically empty, I think the downstream steps also did not work (with a segmentation fault)?

Thank you very much.

Sincerely, Charles

cwarden45 commented 2 years ago

I apologize, but I see the problem:

I needed to change the index length relative to the kite example, meaning that I needed to use -k 11.

So, the correct code to index the ClickTags was as follows:

#!/bin/bash

NAME=ClickTagWhitelist
#NAME=BC41-60whitelist

python3 featuremap.py $NAME.csv --t2g $NAME.t2g --fa $NAME.fa --header --quiet
kallisto/kallisto index -i $NAME.idx -k 11 ./$NAME.fa

The pseudoalignment rate is now much better. For example, I now see processed 15,928,888 reads, 15,357,069 reads pseudoaligned.

I hope that this can help others, and I am closing this ticket.