nf-core / modules

Repository to host tool-specific module files for the Nextflow DSL2 community!
https://nf-co.re/modules
MIT License
277 stars 695 forks source link

new module: gatk4/genotypegvcf #200

Closed maxulysse closed 2 years ago

maxulysse commented 3 years ago

I think it would be good to have module for gatk4

GCJMackenzie commented 3 years ago

Hi @santiagorevale I am currently working on some gatk modules as part of my PhD project and genotypeGVCFs is one of the tools I want to include in my final workflow. I know it has been a while since you worked on this, but I was wondering how far you managed to develop the module and if there is anything I could do to help get it finished? Would be happy to help out so please let me know if there is anything!

santiagorevale commented 3 years ago

Hi GC, you are right, I've completely forgotten about this. When I stopped, I was waiting for the test data to be released. By the time the test data was ready, something on the templates had changed and I needed to redo some stuff. Give me a few days to look into this and I'll get back to you. Thanks for reminding me about it and for your help. Cheers!

On Thu, Sep 16, 2021 at 4:48 PM GCJMackenzie @.***> wrote:

Hi @santiagorevale https://github.com/santiagorevale I am currently working on some gatk modules as part of my PhD project and genotypeGVCFs is one of the tools I want to include in my final workflow. I know it has been a while since you worked on this, but I was wondering how far you managed to develop the module and if there is anything I could do to help get it finished? Would be happy to help out so please let me know if there is anything!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nf-core/modules/issues/200#issuecomment-921020888, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC4HWG3WAHCVRL3S6BPZQY3UCIGVJANCNFSM4XX3DJWA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

GCJMackenzie commented 3 years ago

Hi Santiago, thanks very much for looking into this module again!

GCJMackenzie commented 3 years ago

Hi @santiagorevale, just wanted to check how things are going with the module? I also wanted to check that the module will able to run genotypegvcf using joint (multisample) genotyping? As this is an important step in the joint germline variant calling workflow. Details here: https://gatk.broadinstitute.org/hc/en-us/articles/360035890411-Calling-variants-on-cohorts-of-samples-using-the-HaplotypeCaller-in-GVCF-mode and here: https://gatk.broadinstitute.org/hc/en-us/articles/360035889971--How-to-Consolidate-GVCFs-for-joint-calling-with-GenotypeGVCFs

It would be great to get this module up and running soon so please let me know if there is anything I can do to help, as I am happy to do so in any way I can. Thanks again for your time working on this module!

santiagorevale commented 2 years ago

Hi GC, technically, the command automatically handles if it's multi- or single-sample, because it all depends on the input gVCF file that you will be providing. I'll get back to you this Saturday. Cheers!

On Tue, Oct 12, 2021 at 9:57 AM GCJMackenzie @.***> wrote:

Hi @santiagorevale https://github.com/santiagorevale, just wanted to check how things are going with the module? I also wanted to check that the module will able to run genotypegvcf in joint (multisample) genotyping? As this is an important step in the joint germline variant calling workflow. Details here: https://gatk.broadinstitute.org/hc/en-us/articles/360035890411-Calling-variants-on-cohorts-of-samples-using-the-HaplotypeCaller-in-GVCF-mode and here: https://gatk.broadinstitute.org/hc/en-us/articles/360035889971--How-to-Consolidate-GVCFs-for-joint-calling-with-GenotypeGVCFs

It would be great to get this module up and running soon so please let me know if there is anything I can do to help, as I am happy to do so in any way I can. Thanks again for your time working on this module!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nf-core/modules/issues/200#issuecomment-940808674, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC4HWG66LSR6NTCZZI5W7SDUGP2BHANCNFSM4XX3DJWA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

GCJMackenzie commented 2 years ago

Hi Santiago, Just as a heads up, reading the documentation it seems that multi sample mode requires the input to be provided as a genomicsdbworkspace, as such, single files are provided as: -V example_file.g.vcf whereas the multisample workspace needs to be provided as: -V gendb://example_genomicsdb

so the module needs to add "gendb://" to the -V argument when running multisample.

The GATK documentation also recommends specifying a temporary directory using --tmp-dir for multisample, though I am uncertain if this is necessary or not.

https://gatk.broadinstitute.org/hc/en-us/articles/360056970432-GenotypeGVCFs

Thanks!

santiagorevale commented 2 years ago

Hi GC,

Multi-sample can be done as well without using genomicsdb. Here's an extract of the link you provided:

Perform joint genotyping on a singular sample by providing a single-sample GVCF or on a cohort by providing a combined multi-sample GVCF

gatk --java-options "-Xmx4g" GenotypeGVCFs \ -R Homo_sapiens_assembly38.fasta \ -V input.g.vcf.gz \ -O output.vcf.gz

I have just sent the Pull Request. It should be implemented soon.

Good luck with your project.

Cheers, Santiago

On Thu, Oct 14, 2021 at 7:48 PM GCJMackenzie @.***> wrote:

Hi Santiago, Just as a heads up, reading the documentation it seems that multi sample mode requires the input to be provided as a genomicsdbworkspace, as such, single files are provided as: -V example_file.g.vcf whereas the multisample workspace needs to be provided as: -V gendb://example_genomicsdb

so the module needs to add "gendb://" to the -V argument when running multisample.

The GATK documentation also recommends specifying a temporary directory using --tmp-dir for multisample, though I am uncertain if this is necessary or not.

https://gatk.broadinstitute.org/hc/en-us/articles/360056970432-GenotypeGVCFs

Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nf-core/modules/issues/200#issuecomment-943627961, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC4HWG345AIE3P7HX4IAUGDUG4QZHANCNFSM4XX3DJWA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.