harvardinformatics / snpArcher

Snakemake workflow for highly parallel variant calling designed for ease-of-use in non-model organisms.
MIT License
63 stars 30 forks source link

Organization refactor #8

Closed cademirch closed 2 years ago

cademirch commented 2 years ago

Hey everyone,

I've made some pretty big changes to the overall organization of the repo, mainly trying to follow the Snakemake best practices.

Still todo:

I have also been testing this workflow on different datasets on our own local servers and have run into some issues that would be helpful for users in the future:

I think this covers everything, let me know if you have any questions or concerns. Would appreciate testing and/or feedback.

tsackton commented 2 years ago

Hi Cade,

This is excellent, thanks for the hard work! I will review in more detail over the next couple of days, but a few quick comments right now:

1) The freebayes workflow is more or less depreciated and I don't think it should be a high priority. While ultimately it would be sensible to support alternate SNP callers, the workhorse for all the analysis we intend will be the GATK pipeline so we should focus on that. This will also make it easier to run fastq -> vcf as we don't need to worry about handling differences between GATK and FreeBayes outputs and setup in one rule.

2) @sjswuitchik is also working on testing so you two should collaborate so as to avoid duplicating effort. It might make sense to work on this in a branch of this repo instead of a fork, as that will make collaborating a bit more straightforward.

3) For the fastqs / gzip rule, part of the logic here is that the fastqs won't always be deleted. For example if someone is running on local data (not from SRA) then they don't want to delete the fastqs. Ideally the same setup should support local and SRA runs without needing anything other than to make sure that your local files are arranged in the correct directory structure for Snakemake to realize everything is present. But this means we need to be really careful about deleting files in the pipeline itself.

More soon

tsackton commented 2 years ago

That said I am going to merge this into the restruct_dirs branch of the main repo so we can do some local testing too, and fix any merge conflicts with @sjswuitchik latest bugfixes. I think @cademirch you have push access to this repo so if you find additional bugs you should be able to push directly to this branch.