Closed ghost closed 9 years ago
@ckennedy-nmdp @janders3-nmdp @mthorsen-nmdp @ybolon-nmdp what do you think about this suggestion?
Okay by me.
In theory I like it, but we're gonna hit the reference with a lot of I/O. This might present a network problem.
To clarify, I'm suggesting to pull the sequences and their indices from the ftp link above instead of whatever process we're using locally to index and then push reference data to the AWS instances.
Ah, got it. Agreed then. I'll setup a time for @mthorsen-nmdp and I to get this done.
I approve of this approach.
@janders3-nmdp, @mthorsen-nmdp downloaded the reference and index files yesterday (I believe) -- can we put those in data/reference? Hopefully this will be the first smoke-test of the pipeline -- assuming I can get to that today.
Can we close this issue now?
Yes
Fixed by pull request #102 and #103
Consider using reference sequences directly from GRCh38. In the ftp link
ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh38/seqs_for_alignment_pipelines/
a full analysis gzipped FASTA file, no_alt analysis gzipped FASTA file, and tar gzipped bwa indices are available.
See also issue #60