nf-core / viralrecon

Assembly and intrahost/low-frequency variant calling for viral samples
https://nf-co.re/viralrecon
MIT License
122 stars 110 forks source link

Modules required for DSL 2 implementation of pipeline #149

Closed JoseEspinosa closed 3 years ago

JoseEspinosa commented 3 years ago
drpatelh commented 3 years ago

Goodness me, that's a bigger list than I was expecting!! 😅

heuermh commented 3 years ago

Once the pattern is in place, I should be able to handle the seqwish/vg ones

cjfields commented 3 years ago

Following on what @heuermh said: is there a general 'nf-core'-y template (a standard way) that DSL2 modules are being set up? This would be of general use to other Nextflow and nf-core projects

heuermh commented 3 years ago

@cjfields Yes, there is plenty of discussion in the modules channel on Slack (https://nfcore.slack.com/archives/CJRH30T6V) and I believe the https://github.com/nf-core/rnaseq workflow is leading the way

drpatelh commented 3 years ago

What @heuermh said and https://github.com/nf-core/modules for good measure.

drpatelh commented 3 years ago

Modules ready to push to nf-core/modules

Modules with customisation that we may not be able to push to nf-core/modules

andersgs commented 3 years ago

@drpatelh you can assign ivar to me, I have then written and just adding some tests.

I fixed the ivar bioconda recipe to depend on Samtools, so that the container and conda environment will contain samtools. This address the issue where, in some case, ivar requires samtools (e.g., samtools mpileup | ivar consensus)

Update the tag to _1: ivar:1.3--h089eab3_0 -> ivar:1.3--h089eab3_1 to use the updated version of the bioconda recipe.

andersgs commented 3 years ago

@drpatelh can you send me the issue you are having with spades on bioconda? i can work with it.

JoseEspinosa commented 3 years ago

Hi @drpatelh I have a couple of questions regarding the implementation of the bcftools_isec module, first I am not sure whether it will be used since it is now commented. Then, I saw that each vcf file (and corresponding tbi index) is provided as a separate item of the declaration of the input channel. Wouldn't it be more convenient to declare a single path and collect all the files before inputting them in the module. This way the declaration of the module input would be more generic.

drpatelh commented 3 years ago

Amazing! Thanks @andersgs 😎 I will wait for you to push the modules here before I install and use them in viralrecon. It would be great if ivar variants and ivar consensus are able to support both BAM and mpileup input. We have had questions posed about variants being called where we have had to ask users to go back to the mpileup files. That's one of the reasons I have an explicit step for mpileup generation, and the other one is that the mpileup file can be generated with exactly the same parameters and passed to VarScan2 too. We are using BAM input and pipes for BCFTools via bcftools mpileup. Let's discuss on your PRs 👍🏽

drpatelh commented 3 years ago

@JoseEspinosa yes, my bad, I still need to do a little work on that process but I have moved it to the "not fit for nf-core/modules" section at the bottom. It is commented out in the main workflow because I added some fresh sub-workflows for each variant caller at the end of last week, and I need to sort out all of the channels to get it working again 🙂

Currently, the reason we can't use a simple collect there is because I have used exactly the same naming convention for each of the callers for simplicity. This means if we stage them all together then Nextflow will barf. This is why they are being staged in their own directories at the moment. This one may have to stay as a local module but you could maybe add a generic one to nf-core/modules that just takes a list of tuple val(meta), path(vcfs). It would be a good start anyway and we almost certainly expect these modules to be updated. Adding the tests and docs is always the bit that takes the longest 😅

andersgs commented 3 years ago

Rather than add samtools as a separate step (I am thinking of minimising having to spin up containers), would it better off to add tee to the step and capture the output --- perhaps optionally:

samtools mpileup | tee pileup.txt | ivar consensus

I agree that we should be moving the bcftools mpileup (since samtools mpileup is being deprecated) but the docs for ivar are clear on depending on samtools mpileup, I haven't chased up with the ivar team on this.

drpatelh commented 3 years ago

Rather than add samtools as a separate step (I am thinking of minimising having to spin up containers), would it better off to add tee to the step and capture the output --- perhaps optionally:

Great idea! Sounds like a good compromise 👍🏽 I will just pass the same mpileup generating arguments I am using for VarScan here too.

drpatelh commented 3 years ago

I am assuming the Andersen lab have been ridiculously over-run like all of us 😓. We added a MultiQC module for iVar last year and a script to convert the tabular output to VCF. Still waiting for the latter to be added officially too.

https://github.com/andersen-lab/ivar/issues/17

heuermh commented 3 years ago

@drpatelh bandage.nf looks reasonable to me, what do you need to push to nf-core/modules? I can pick that up if you'd like. Assign minimap2 to me as well.

drpatelh commented 3 years ago

That would be great @heuermh. I assigned it to you in the issue above 👍🏽 I haven't written a module for minia (or any of the other variant graph modules) yet but feel free to assign yourself / add any of the others above too 🙂 I have been quite pre-occupied with overhauling the CI tests we are using on nf-core/modules but hoping to come back to a sprint here soon.

drpatelh commented 3 years ago

Awesome work guys! Especially @JoseEspinosa 😎 Going to close this issue since we are pretty much done. Thank you!