pachterlab / splitcode

Flexible and efficient parsing, interpreting and editing of sequencing reads
https://pachterlab.github.io/splitcode/
BSD 2-Clause "Simplified" License
35 stars 0 forks source link

Question on an example on the docs #7

Closed JohnMMa closed 7 months ago

JohnMMa commented 7 months ago

In the section "Put technical sequences into separate files" in the docs, it first discussed a command line to extract 10X v3 sequences:

splitcode -x "0:0<barcode>0:16,0:16<umi>0:28,1:0<cdna>1:-1" --x-only --nFastqs=2 --gzip R1.fastq.gz R2.fastq.gz

However, was it tested to work? I attempted running it through through the Helper and the splitcode program I compiled today, and no sequences were extracted in either.

For the Helper, I use the first pair from the gex files from 10X 5k_pbmc_protein_v3_nextgem dataset. Specifically, R1 is as follows:

@A00519:265:HKCKGDMXX:1:1101:1723:1000 1:N:0:TAACAAGG
CCAATTTTCATTTCCAGACCCGTATCGC
+
FFFFFFFFFFF,FFFFFFFFFFFFFF:F

And R2 is as follows:

@A00519:265:HKCKGDMXX:1:1101:1723:1000 2:N:0:TAACAAGG
AAGGACAGAGAAGTCTTGACACACATTGTAGTAGTGACAATTATGATTTCCTACAGACAAGATAATCTCCAACACATACAAACACACACAG
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:,FFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFF:FFFFFFF
Yenaled commented 7 months ago

Seems to work on my end:

https://colab.research.google.com/drive/1U2T4pZvyTBNAQeJtU5caPaQXH7sX_QXu?usp=sharing

JohnMMa commented 7 months ago

Your command line (splitcode -x "0:0<barcode>0:16,0:16<umi>0:28,1:0<cdna>1:-1" --x-only --nFastqs=2 --gzip R1.fastq.gz R2.fastq.gz) works on our end.

In that case, I think the issue has something to do with --assign, since the Helper commandline (and I just copied it to to the terminal) was splitcode -c config.txt -N 2 --assign --mapping=mapping.txt --outb=final_barcodes.fastq -o out_file_0.fastq,out_file_1.fastq file_0.fastq file_1.fastq... I suppose assign is before extract in processing order?

Yenaled commented 7 months ago

Oh, yeah, don't use --assign because --assign assumes you have "tags" you want to extract (and if no "tags" are present, no read is going to be considered "assigned" and therefore no reads are going to be outputted).

Side note: The purpose of --assign is to select certain reads and give each of those selected reads some sort of unique identifier based on the barcodes that are present.

Apologies for the confusion.

JohnMMa commented 7 months ago

OK, thanks a lot! Closing it--I will open a separate issue for a feature request.