sanjaynagi / AmpSeeker

A snakemake workflow for amplicon sequencing
https://sanjaynagi.github.io/AmpSeeker/
0 stars 3 forks source link

Conditional logic 27-10-22 #7

Closed sanjaynagi closed 1 year ago

sanjaynagi commented 1 year ago

Splitting PR #2 by @ChabbyTMD. Please see description there.

sanjaynagi commented 1 year ago

Hey Trevor,

I moved the things in the BgzipTabixLite.smk file into the original file and deleted it just so we don't have multiple rule files which do similar things.

This meant I had to add to your conditional statement something which basically says, if we have over 1000 samples, we require the special multiple merging rule to be used:

if len(metadata) > 1000:
    n_samples = len(metadata)
    half = int(n_samples/2)
    samples1 = metadata['sampleID'][:half]
    samples2 = metadata['sampleID'][half:]
    large_sample_size = True
else:
    large_sample_size = False
    samples1 = []
    samples2 = []

rule all:
      input:
          merge_vcfs = "results/vcfs/.complete.merge_vcfs" if large_sample_size else [],

rule bcftools_merge3:
    input:
        vcf = expand("results/vcfs/{{dataset}}.{n}.vcf.gz", n=[1,2]),
        tbi = expand("results/vcfs/{{dataset}}.{n}.vcf.gz.tbi", n=[1,2]),
    output:
        vcf = "results/vcfs/{dataset}_merged.vcf",
        touch("results/vcfs/.complete.{dataset}.merge_vcfs")   # this creates an empty file with this name

So if there is a large_sample_size, the workflow requires "results/vcfs/.complete.{dataset}.merge_vcfs" to be made, which in turn requires the multiple merging steps to be done.

ChabbyTMD commented 1 year ago

Hey Sanjay,

This is a more elegant solution. Indeed, it streamlines the workflow from having redundant rules.