nf-core / sarek

Analysis pipeline to detect germline or somatic variants (pre-processing, variant calling and annotation) from WGS / targeted sequencing
https://nf-co.re/sarek
MIT License
410 stars 418 forks source link

DeepVariant params addition request: #1583

Open poddarharsh15 opened 4 months ago

poddarharsh15 commented 4 months ago

Description of feature

Description: I am writing to request the addition of parameters for specifying haploid contigs and regions when detecting SNPs and Indels using DeepVariant in the nf-core/sarek pipeline. These parameters are essential for our benchmarking and analysis using GIAB data.

Proposed Parameters:

--haploid_contigs "${HAPLOID_CONTIGS}"
--regions "${REGION}"

Use Case: Including these parameters will allow users to define specific regions and haploid contigs for their analysis, improving the flexibility and accuracy of the SNP and Indel detection process.

Example Usage:

"haploid_contigs": "chrX,chrY",
  "regions": "chr20:10,000,000-10,500,000"

This example demonstrates how users can specify the haploid contigs and regions in the params.json file.

Benefits: Enhanced control over the genomic regions being analyzed. Improved accuracy for SNP and Indel detection, especially in specialized cases like haploid genomes. Thank you for considering this request. Your assistance in improving the nf-core/sarek pipeline is greatly appreciated. @maxulysse

Docs for help 1.

2.

Best regards, Harsh Poddar

FriederikeHanssen commented 4 months ago

Hey! The regions command is already implemented and can be managed by providing a bed file to --intervals. This is then used for all relevant steps in the pipeline: https://github.com/nf-core/sarek/blob/b5b766d3b4ac89864f2fa07441cdc8844e70a79e/modules/nf-core/deepvariant/main.nf#L31

The haploid contigs, we can add. In the meantime, you could provide those via a custom config, see docs

poddarharsh15 commented 4 months ago

Hi @FriederikeHanssen something like this does it make sense?


process {
    withName: NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_GERMLINE_ALL:BAM_VARIANT_CALLING_DEEPVARIANT {
        ext.args = "--haploid_contigs="chrX,chrY""
    }
}

process {
    withName: DEEPVARIANT {
        ext.args = "--haploid_contigs="chrX,chrY""
    }

inspired by:- https://github.com/google/deepvariant/blob/r1.6.1/docs/deepvariant-haploid-support.md

FriederikeHanssen commented 4 months ago

yes sorry missed your answer:

process {
    withName: NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_GERMLINE_ALL:BAM_VARIANT_CALLING_DEEPVARIANT {
        ext.args = "--haploid_contigs="chrX,chrY""
    }
}

looks right.

Just a note: ext.args is not additive. So if there are other arguments you want to take with from conf/deepvariant.config you will need to add those in.

poddarharsh15 commented 4 months ago

Hi @FriederikeHanssen

Thank you for your response. Unfortunately, these parameters won’t work because the DeepVariant version that Sarek is using does not recognize the --haploid_contigs parameter. I tried running and updating to Version 1.6.0 and the module locally on my cluster, and while I was able to detect some variations in chrX, I could not detect any in chrY. I will add the link to the Slack discussion where another developer was assisting me for your reference. Here

asp8200 commented 4 months ago

I guess @poddarharsh15 meant to link to this discussion on Slack.

poddarharsh15 commented 4 months ago

Hi @asp8200 thanks for adding the link I have missed it because I was on my phone