Open arahuja opened 8 years ago
Thanks for making this an issue! Makes it easier to discuss. I made repo public so that links would work better.
In hopes of fostering discussion, wanted to note some design decisions I've been thinking about.
tcga-blca
has concept of a DATA_DIR
-- root directory in which files get loaded. This currently mirrors directory structure imposed by gdc-client
.
gdc-client
could change this directory structure. What should the command look like for downloading files? Should this be get_vcfs
, analogous to get_cases
, with download behind the scenes? Or should the download event be handled more explicitly?
We should probably also reconcile the current approaches to filtering in order to support this.
should download of VCFs be different from that for other sample-data files (e.g. Raw Sequencing Data)?
Should be able to use what @jburos has here: https://github.com/jburos/tcga-blca/blob/master/query_tcga/query_tcga.py#L246-L302