phoward42 / APP-16S-mice-study

Documentation for 16S data analysis of apple pomace fed mice study
0 stars 0 forks source link

Final project submission #2

Open phoward42 opened 1 month ago

phoward42 commented 1 month ago

@jelmerp - here is my work for the final project submission!

I am pretty happy to have pivoted to using an nf-core pipeline due to its ease of use. One problem I ran into unrelated to the code (pretty sure lol) was with my denoising. Each time it ran (either independently or through ampliseq) I was getting really high numbers of chimeric sequences (>90%!). My thinking is that a different set of primers were used than the ones I am assuming in the pipeline, but I am still waiting to hear back from my collaborators to see if that is the case. Any ideas what I could try looking into otherwise? It feels unrealistic that nearly all my sequences are getting flagged as non-biological even after trimming so I'm willing to cover any and all bases to be sure.

Regardless, thanks for teaching this class! It has been very informative and empowering :)

-Peter

jelmerp commented 1 month ago

Presentation

Aspect Max. points Your points
Technical content 3 3
Contextualization 3 3
Delivery 3 3
Clarity 3 2
Questions for others 3 1
total 15 12

Great presentation! A couple more questions for others would have been nice.

Final submission

Aspect Max. points Your points Comment
Project organization 4 3 See below.
Project background and documentation 4 3 Comments on next steps after the Ampliseq pipeline would have been good.
Good practices in scripts 4 4
Workflow reproducibility 4 3 You might have included e.g. in the README the command to install Nextflow with Conda.
Slurm jobs at OSC 3 3
Project/coding quality 3 3
Version control 3 3 Nicely done with fine-grained commits.
total 25 22

Nicely done, Peter, and I'm glad to hear this course has been helpful for you! A few comments:

As for your issue with chimeric sequences, from the Ampliseq results overview from your results folder shown below it doesn't look like there are chimeras at all (the nonchim number is as high as the number to the left of it, i.e. no ASVs were deemed to be chimeras). BUT you are losing nearly all your reads much earlier, at the DADA2 filtering step. This may have to do with the read quality: take a look at the FastQC output and see if your quality scores perhaps look poor. Adjusting, especially, the DADA2 parameter "maxEE" (max. expected errors per reads, which the pipeline should have an option for) may help! Let me know if you need any more help with that moving forward.

image