mycobactopia-org / MTBseq-nf

MTBSeq made simple and easy using Nextflow and nf-core standard.
https://doi.org/10.5281/zenodo.5498063
MIT License
8 stars 1 forks source link

[BONUS-BRANCH] Adding trimmomatic as optional tool #47

Closed Mxrcon closed 2 years ago

Mxrcon commented 2 years ago

Hey @abhi18av as we discussed on our meeting, I decided to add a new module that uses trimmomatic to trim the reads before submitting to MTBseq, here goes my work on it

This pull request closes #46

PR content:

Feel free to discuss the changes with me, I'm available to make any necessary change, propose new changes and implement anything that you feel necessary! Please contact me in the comments.

Hope we have a merge! Kindly, Davi

Mxrcon commented 2 years ago

Just for note:

i'd like to redistribute the adapters as a data/ file but as mentioned on Trimmomatic manual

Illumina adapter and other technical sequences are copyrighted by Illumina,but we have been granted permission to distribute them with Trimmomatic.

so we should rely on hard code stuff ( using it from docker/conda), If i could choose, I'll implement adapters into a channel like the references.

abhi18av commented 2 years ago

so we should rely on hard code stuff ( using it from docker/conda) ...

You've raised a good point Davi, I think that perhaps it makes sense to simply use the https URL as file path for these adapter from the trimmomatic repo.

https://github.com/usadellab/Trimmomatic/tree/v0.39/adapters

And then we don't get into distributing them ourselves.

abhi18av commented 2 years ago

@Mxrcon , on second thoughts - I think use_trimmomatic indicates that we might have other trimming options in future such as fastp.

I'd like to keep this option tool-neutral, so let's change that to trim_raw_reads please.

Mxrcon commented 2 years ago

@Mxrcon , on second thoughts - I think use_trimmomatic indicates that we might have other trimming options in future such as fastp.

hmm makes sense, i'll revert the param name to what we discussed

abhi18av commented 2 years ago

Just adding this as a reference here

Adapter trimming is enabled by default, but you can disable it by -A or --disable_adapter_trimming. Adapter sequences can be automatically detected for both PE/SE data.

From https://github.com/OpenGene/fastp#adapters

I'm curious to read deeper about trimmomatic/fastp now :)

But anyhow, for the time being, let's continue with trimmomatic here.

Mxrcon commented 2 years ago

Hey, this makes sense. why you dont add a issue with hold label? so we can discuss this later

abhi18av commented 2 years ago

Hey, this makes sense. why you dont add a issue with hold label? so we can discuss this later

Na, not needed. We're good to go ahead with the work on this PR. The fastp is something which will stay on my mind for sure. I'll add a discussion for that.

abhi18av commented 2 years ago

I've marked this PR as draft and when the requested changes have been accommodated, please let me know :)

abhi18av commented 2 years ago

@Mxrcon , after much thought I think it might be better to leave/delegate the addition of this module to the workflow to simplify the deps and keep the overall code only focused on MTBseq. This way when we publish we can only focus upon direct improvement in usability of the workflow.

I do think however, that the work done on this PR, could be used as a demo on how to add a new module to the project and I'm planning to keep this this code on a branch and link to it in the usage docs for mtbseq-nf.

What are your thoughts?

Mxrcon commented 2 years ago

hmm makes complete sense, I agree with you. We could mention this on contribution guidelines and other stuff.

Lets keep this repository focused on mtbseq wrapping!

abhi18av commented 2 years ago

@Mxrcon , as we have discussed earlier in the thread about limiting the scope and keeping https://github.com/mtb-bioinformatics/mtbseq-nf/issues/30#issuecomment-948731636 in mind, I think I'll close this PR for the time being.

I'll update the title so that it's referenceable in future for anyone.

Thank you once again Davi for your time and effort on this - in future, we'll build upon this PR as an example on how to extend a workflow. Please refer this https://github.com/mtb-bioinformatics/mtbseq-nf/issues/35#issue-991396840