laurahspencer / DuMOAR

0 stars 0 forks source link

Quality trim MBD-BS data #8

Closed sr320 closed 10 months ago

laurahspencer commented 1 year ago

Trimmed with Trim Galore using the slurm script trim-mbdbs.sh. Followed recommended settings for the Zymo Pico-Methyl kit, as per the Bismark developers. The script was written based on the MethCompare project's code.

Here's the MultiQC report for the trimmed data:

kubu4 commented 1 year ago

@laurahspencer - Do you have the FastQC data from before trimming that we could glance at? Might be useful for refence in discussion in Issue #15 .

sr320 commented 1 year ago

Do you have the FastQC data from before trimming that we could glance at?

Straight from seq facility before anything was done to it....

laurahspencer commented 1 year ago

Here's the multiQC report on the data that has been concatenated by sample - multiqc_report_raw

I'll run fastqc/multiqc on the un-concatenated files now, and will get back to you

sr320 commented 1 year ago

Thanks - and what type of sequencing was this? paired-end 150bp?

laurahspencer commented 1 year ago

As per @kristamnichols and the sequence length distribution (see multiqc report), we believe one lane/run was 100bp, and the other was 150bp.

sr320 commented 1 year ago

Would want to concatenate after trimming. Lets have look at raw fastqc first.

laurahspencer commented 1 year ago

Would want to concatenate after trimming. Lets have look at raw fastqc first.

Why does it matter? - My understanding is that trimming/filtering works on each read separately, so it doesn't make a difference whether trimming occurs before/after concatenating

kubu4 commented 1 year ago

Why does it matter?

Had the same thought. Concatenating is just adding lines of text to the end of an existing file, so shouldn't have an impact on anything downstream.

sr320 commented 1 year ago

Might not. But two different read lengths? could be two "runs" thus batch effects? I would assess they are similar in MDS.

And how do we know the trim needs are not different if we have not seen fastqc on raw?

sr320 commented 1 year ago

What is the first rule of FISH546? :)

kubu4 commented 1 year ago

But two different read lengths?

I was assuming the only concatenation taking place were samples from multiple lanes, the same sequencing parameters.

Certainly would refrain from concatenating a mix of runs with different sequencing params - not sure how downstream software handles FastQs with inconsistent read lengths.

laurahspencer commented 1 year ago

Yeah I assumed the same, until I saw the fastqc results, chatted with krista, and realized that i we have a mix of 100bp and 150bp reads. i'll plan to re-run the pipeline with trimming occurring prior to concatenating.

kubu4 commented 1 year ago

Don't forget to post FastQC/MultiQC of raw, non-concatenated reads. 😃

laurahspencer commented 1 year ago

Yup- here you go -

sr320 commented 1 year ago

I am going to suggest only about 20bp are good in one batch and 40bp in the other....

sr320 commented 1 year ago

i'll plan to re-run the pipeline with trimming occurring prior to concatenating.

but run some PCAs / MDS before you concatenate.

kubu4 commented 1 year ago

I am going to suggest only about 20bp are good in one batch and 40bp in the other....

I don't think I follow. Can you elaborate on what you mean by this?

sr320 commented 1 year ago

I am basing that on the fact these lines should essentially be horizontal - https://d.pr/i/mevTec

This is indicative of artificial sequences / adaptors

kubu4 commented 1 year ago

Previous quality/trimming results show ~120bp are good, though.