mazzalab / fastqwiper

An ensemble method to recover corrupted FASTQ files, drop or fix pesky lines, remove unpaired reads, and settle reads interleaving.
GNU General Public License v3.0
25 stars 3 forks source link

Fastqwiper generates problematic names for paired reads #13

Closed search42 closed 10 months ago

search42 commented 11 months ago

I encountered an issue with a paired-end fastq data. I attempted to fix my data using the image mazzalab/fastqwiper:2023.2.82 and the fix_wipe_pairs_reads_parallel.smk within it. The software fastp can handle the output data properly, but when aligning the resulting files of the software fastp to a genome using BWA, an error message appeared: :

[mem_sam_pe] paired reads have different names: "=/>;780?3=:GD<IH+985I23H5D>II521+<54HA8=G", "="

I examined the fastq data produced by the fix_wipe_pairs_reads_parallel.smk workflow and indeed found such problems:

==> test_fastp_1.out.fq.gz <== 112468777-@XXXX_R08108324978/1 112468778-ATTGAAGAGATATTGAAGAGATAATTGAAGAGATAATTGAAGAGATAATTGAAGAGATAATTGAAGA 112468779-+ 112468780-FB@GDDEEEDBFFGDGGEFFGGEEFFEFEDEEFFFFCADEEEDBDEFD:GDDDAE@FHDEFG@ECDD 112468781:@=/>;780?3=:GD<IH+985I23H5D>II521+<54HA8=G 112468782-ATGGAAGGACATGACCCTGAAAGCAGACATCCCATCTTCCTTTCTCCCTCACCCACACACTGGGCGT 112468783-+ 112468784-HFGFGIFFFGIHHGGHHIIGHEIGIIGGHHGHHGGIIHHIHHGHHGGHIGGGHHIHIIIHHIEFIIG

==> test_fastp_2.out.fq.gz <== 112468777-@XXXX_R08108324978/2 112468778-ATTGTCTAGGTCTAGGTCTAGGTCTAGGTCTAGGTCTAGGTCTAGGTCTAGGTCTAGGTCTAGGTAA 112468779-+ 112468780-GEEFDFFEEFEHFFGEFBGFFHFDEFDCEFG6FCEFFEEFDDFHEEGGFEBEFEFCDDFDGDDFFGE 112468781:@= 112468782-GGTACTGGTTTTCTATTCCAAGGCTGTTTTCTATACAAACATGCTTGAAAACAATCATTTGGAACAA 112468783-+ 112468784-GIFHIIHGDGFFIHGGFHHGEIGGGIFHHHGGGIGGFGEGFGHGGHIFFGGHHEHHGFHHHGFGIFF

The same issue also appears in the output paired-end reads of fix_wipe_pairs_reads_parallel.smk

mazzalab commented 11 months ago

Can you report the command line used to fix your files?

search42 commented 11 months ago

Can you report the command line used to fix your files?

Thank you for such a quick reply. After the data is mounted, run the following command in the image:

set -eu -o pipefail cd ${out-dir} ln -s /fastqwiper/bbmap /fastqwiper/pipeline /fastqwiper/run_wiping.sh ${out-dir} mkdir -p ${out-dir}/data ln -s ${read1-input} ${out-dir}/data/${sample-id}_R1.fastq.gz ln -s ${read2-input} ${out-dir}/data/${sample-id}_R2.fastq.gz perl -i -pwe 's#cd /fastqwiper##' run_wiping.sh bash run_wiping.sh paired 8 ${sample-id} 50000000

mazzalab commented 11 months ago

OK, I'm afraid I need to use your fastq files to debug fastqwiper. Can you send them to me?

mazzalab commented 11 months ago

To debug the software we really need to use your files. Can you share them?

search42 commented 11 months ago

To debug the software we really need to use your files. Can you share them?

I'm sorry, but I can't share it with you because of the data privacy policy. If I'm having the same problem with public data, I'd be happy to share it with you

mazzalab commented 11 months ago

Please share them or link

mazzalab commented 10 months ago

If you can share any file that causes this error, please reopen the task