Provides algorithms with data that is regularised and up-to-date.
Mirrors the DCC (TCGA Data Coordination Center) nightly, scans for new SDRF (Sample and Data Relationship format).
Eliminates two types of variation, explicitly allow by spec (e.g. naming and layout of files).
Collection of samples used using criteria such as tumour type and exclusion lists from Disease Working Groups and Biospecimen Core Resource -> clustering group membership.
Collection of per-sample files merged together into a single file -> Firehose-hosted
Provides algorithms with data that is regularised and up-to-date. Mirrors the DCC (TCGA Data Coordination Center) nightly, scans for new SDRF (Sample and Data Relationship format).
Eliminates two types of variation, explicitly allow by spec (e.g. naming and layout of files).
Collection of samples used using criteria such as tumour type and exclusion lists from Disease Working Groups and Biospecimen Core Resource -> clustering group membership.
Collection of per-sample files merged together into a single file -> Firehose-hosted
https://confluence.broadinstitute.org/display/GDAC/fbget#fbget-configuration