marcelm / cutadapt

Cutadapt removes adapter sequences from sequencing reads
https://cutadapt.readthedocs.io
MIT License
518 stars 129 forks source link

Demultiplexing paired-end reads in mixed orientation Sequences with Zero output #656

Closed ColdySnow closed 1 year ago

ColdySnow commented 1 year ago

Hello everybody,

first of all general information: Cutadapt version: 2.8 Python version: 2.7.18

I have no information about the installing process (it was done due my university long ago before I joined the lab).

However, I demultiplexed a huge FASTA-file. Each Sequences had two barcodes which enables to identify which sequences belongt to which samples. All in all we have 120 samples, so I run following command for each sample:

for the first barcode

cutadapt -g ^file:barcodes.fasta \ -o round1-{name}.R1.fastq.gz \ -p round1-{name}.R2.fastq.gz \ R1.fastq.gz R2.fastq.gz

for the second barcode

cutadapt -g ^file:barcodes.fasta \ -o round2-{name}.R2.fastq.gz \ -p round2-{name}.R1.fastq.gz \ round1-unknown.R2.fastq.gz round1-unknown.R1.fastq.gz

Now I run the DADA2 pipeline, with my data, but it turned out, that I am not able to plot the Qualityprofile of the R2 sequences. I found out that this is, because cutadapt allows "zero" sequences as output. I also found out, that you could add "--minimum-length 1" while running cutadapt, but unfortunately, I deleted the FASTA-File already from the server.

So is there any option, how I can delete "zero" sequences in my data after running cutadapt?

I appreciate any kind of help!!

Best regards!

marcelm commented 1 year ago

Hi,

Cutadapt version: 2.8 Python version: 2.7.18

It doesn’t matter in this case, but I don’t think this is correct because Cutadapt 2.8 requires Python 3 to run.

I call the "zero" sequences "empty reads". If you no longer have the original FASTQ files, you can remove them afterwards, also with --minimum-length 1 (I will use the short version -m 1 below), but you have to run that command once on each of the demultiplexed files.

Since you have so many files, you don’t want to do this manually. Here is a way to do this using a for loop:

mkdir noempty
for r1 in *.R1.fastq.gz; do r2=${r1/.R1./.R2.}; cutadapt -m 1 -o noempty/${r1} -p noempty/${r2} ${r1} ${r2}; done

The output files are then in the noempty/ directory.

ColdySnow commented 1 year ago

Perfect, thank you very much for the fast answer. I'll try it as soon as I can!

marcelm commented 1 year ago

Hi, getting back to this after the holidays. Can this issue be closed?