harvardinformatics / snpArcher

Snakemake workflow for highly parallel variant calling designed for ease-of-use in non-model organisms.
MIT License
63 stars 30 forks source link

Using local reference genomes #57

Closed tsackton closed 2 years ago

tsackton commented 2 years ago

Sometimes it may be useful to specify a local reference genome, instead of a one hosted on NCBI. Currently it is fairly straightforward to use local fastq files, but very hacky to use a local reference genome.

Ideally, we should refactor so that the reference genome download works similarly to the fastq download, where if a path is specified for the reference genome, that is used instead of downloading an accession.

cademirch commented 2 years ago

Good idea. Should be easy, I'll do this today.

cademirch commented 2 years ago

This is actually more complicated than I thought it would be because: 1) Snakemake doesn't allow you to use conda envs with the run keyword 2) BWA index doesn't have a output directory option.

Will publish a branch with my solution soon, though its a bit hacky I think.

tsackton commented 2 years ago

Okay I will take a look, thanks for working on this.

cademirch commented 2 years ago

Take a look at my solution in the branch local_refs and let me know what you think.

tsackton commented 2 years ago

Looks reasonable to me.

It doesn't appear that the output.outdir actually needs to be in output, as opposed to params, right? I think then you could avoid making an extraneous directory in the local ref part of the shell command.

Also, in line 17-18 of common.smk you are just checking if the file is gzipped, right? So that should be a more informative workflow error

Otherwise though this looks great and I see the tests pass so we let's merge it

tsackton commented 2 years ago

Closed by #58