Trimmomatic Query - Githubissues

SlowSD commented 7 months ago

Hello there,

Thank you for this amazing resource on variant calling.

I am currently following this analysis and I have a query.

While executing trimmomatic, according to the below code, one has to provide two input files (E.g. file1_1.fastq, and file1_2.fastq) and the code will output four files (E,g. file1_1.trim.fastq, file1_1un.trim.fastq, file1_2.trim.fastq, and file1_2un.trim.fastq).

$ java -jar /usr/local/share/Trimmomatic-main/dist/jar/trimmomatic-0.40-rc1.jar PE SRR1972917_1.fastq.gz SRR1972917_2.fastq.gz SRR1972917_1.trim.fastq.gz SRR1972917_1un.trim.fastq.gz SRR1972917_2.trim.fastq.gz SRR1972917_2un.trim.fastq.gz SLIDINGWINDOW:4:20 MINLEN:25 ILLUMINACLIP:NexteraPE-PE.fa:2:40:15

May I know the difference between file1_1.fastq (original input file), and the file1_1.un.trim.fastq. Additionally, how the latter file has originated?

Thank you for your time.

Best SD

taylorpaisie commented 7 months ago

I am so happy you like and found the tutorial useful!!

The file1_1un.trim.fastq and file1_2un.trim.fastq are the output with the unpaired forward and reverse reads, respectively. You would want to use the file1_1.trim.fastq and file1_2.trim.fastq output files from trimmomatic when performing the reference-based mapping step, but the unpaired reads could give some insight on potential contamination, as an example.

Please feel free to ask any more questions you might have!

SlowSD commented 7 months ago

Hey,

Thank you for answering.

If I download paired reads, is it wrong to assume that each forward read has mating reverse read pair and vice-versa? or does there may be lonely reads!

Best SD

taylorpaisie commented 7 months ago

If I am understanding your question correctly, paired-end sequencing means you will have 2 files, a forward and reverse fastq file. Note, there is also single-end sequencing, and it that case there is only 1 fastq file from a sequencing run. Here's a little more info on both:

In single-end reading, the sequencer reads a fragment from only one end to the other, generating the sequence of base pairs. In paired-end reading it starts at one read, finishes this direction at the specified read length, and then starts another round of reading from the opposite end of the fragment. Paired-end reading improves the ability to identify the relative positions of various reads in the genome, making it much more effective than single-end reading in resolving structural rearrangements such as gene insertions, deletions, or inversions. It can also improve the assembly of repetitive regions.

SlowSD commented 7 months ago

Thanks, a lot for your time and explanation.

I do understand the single-end and paired-end sequencing. I am afraid that I might have caused a confusion somewhere. Here I am rephrasing my query -

Suppose, I input two fastq files, which are mates of each other, file1_1.fastq and file1_2.fastq, for trimmomatic. I am simply expecting two output files - file1_1.trim.fastq and file1_2.trim.fastq.

If I process two fastq files for trimmomatic - file1_1.fastq and file1_2.fastq (where each read has a mate in other fastq file), how unpaired reads - file1_1un.trim.fastq and file1_2un.trim.fastq comes in picture, since I am providing PE flag?

How these two additional files file1_1un.trim.fastq and file1_2un.trim.fastq got generated and why they are necessary?

Thank you for your patience.

taylorpaisie / VEME_2023_NGS_Variant_Calling

Trimmomatic Query #2