Open aaronmcdaid opened 6 years ago
@sinarueeger , any thoughts on this (see the issue description above)
@aaronmcdaid sorry for the delayed answer, was offline during the weekend.
Thanks for looking into this.
I like proposals 1. and 2.
But I think 4. should come before 3. In fact, I don't think any automatic downloading should be done for the users (as in status quo), instead, as you describe, a helpful error message with a description of the ssimp --download ??
command.
I can start on (1) and (2) then. While I/we work on this, we can also consider how to proceed with Issue 59.
(1) and (2) are now done then, as part of version 0.5.2
(3) and (4) aren't done yet
This is open for discussion ... Any thoughts?
This is about the build database and the 1kg data. When do we download them? Is it automatic? Do we offer an option to force the downloads? How do we document this?
Status quo:
If the user specifies
1KG/EUR
(or1KG/AFR
, ..) as their target reference panel, then the 1000 Genomes data must be in~/reference_panels
. Andssimp
offers to download it automatically if it's not already downloaded. As part of this download, it also downloads the build database.Every imputation requires the build database at
~/reference_panels/database.of.builds.1kg.uk10k.hrc.2018.01.18.bin
. Andssimp
will offer to download it if it's not present.Proposals (in a suggested order of implementation): 1) add a
--download build-db
option to perform the download of the build database. 2) add a--download 1KG
option to download 1000 genomes data. (Perhaps this should also imply--download build-db
?). 3) Update the documentation, in particular theusage.txt
, to encourage users to do this first before attempting any imputation 4) (optional) disable the current system which does downloading 'on demand'. Instead, if any files are missing, exit with a helpful error message that directs them to run the--download ???
optionsThis procedure will require the user to do a little more work, but perhaps it's easy to understand as it puts them in control of the process