Closed sdhutchins closed 1 year ago
The below rules are from a prior snakemake workflow. We want to use them as the first 2 steps in the nextflow pipeline.
normalize_vcf
remove_homref_sites
Below is the conda config that was used:
channels: - conda-forge - bioconda dependencies: - bcftools =1.12
Here are the 2 snakemake rules:
rule normalize_vcf: input: vcf = INTERIM_DIR / "single_sample_vcf" / "{train_test}" / "split" / "{sample}.vcf.gz", ref=REF_FASTA, output: INTERIM_DIR / "single_sample_vcf" / "{train_test}" / "normalized" / "{sample}.vcf.gz" message: "Normalizing sample: {wildcards.sample} ({wildcards.train_test})" conda: str(WORKFLOW_PATH / "configs" / "envs" / "bcftools.yaml") threads: 2 shell: r""" # first split multi-allelic sites and then normalize bcftools norm \ -m-any \ {input.vcf} \ | bcftools norm \ --threads {threads} \ --check-ref we \ --fasta-ref {input.ref} \ -Oz -o {output} """ rule remove_homref_sites: input: INTERIM_DIR / "single_sample_vcf" / "{train_test}" / "normalized" / "{sample}.vcf.gz" output: INTERIM_DIR / "single_sample_vcf" / "{train_test}" / "homref_removed" / "{sample}.vcf.gz" message: "Remove homozygous ref sites. Sample: {wildcards.sample} ({wildcards.train_test})" conda: str(WORKFLOW_PATH / "configs" / "envs" / "bcftools.yaml") shell: r""" bcftools view \ --include 'GT[*]="alt"' \ -Oz -o "{output}" \ "{input}" """
The below rules are from a prior snakemake workflow. We want to use them as the first 2 steps in the nextflow pipeline.
normalize_vcf
processremove_homref_sites
processBelow is the conda config that was used:
Here are the 2 snakemake rules: