This pull request implements a number of code cleanups, peformance tweaks, and other usability and quality-of-life improvements, in addition to the rules to calculate the callable sites bed file
Adds a rule to calculate a bed file of callable sites for a particular genome/species, using mappability and coverage to define filters of regions that likely contain problematic SNPs. Parameters are specified in the config file.
Reorganizes the environments and rules to remove redundancy, move all the callable sites rules (mappability, bedgraph generation, and filtering) to one .smk file, and remove outdated FreeBayes files.
Closes #51 by adding -Djava.io.tmpdir=/path/to/tmpdir to the sortVcf command (in a kind of hacky way by appending it to the end of the params gather_vcfs_CLI function).
Add a few potential performance enhancements to the GATK steps
Clean up the config file and resources to hopefully improve performance and readability
Add auto-scaling to the interval parameters to prevent the pipeline from dying on better genomes where defaults are too small
This pull request implements a number of code cleanups, peformance tweaks, and other usability and quality-of-life improvements, in addition to the rules to calculate the callable sites bed file
-Djava.io.tmpdir=/path/to/tmpdir
to the sortVcf command (in a kind of hacky way by appending it to the end of the params gather_vcfs_CLI function).