keaan95 / learning

0 stars 0 forks source link

GDAC Firehose #1

Open keaan95 opened 7 years ago

keaan95 commented 7 years ago

Provides algorithms with data that is regularised and up-to-date. Mirrors the DCC (TCGA Data Coordination Center) nightly, scans for new SDRF (Sample and Data Relationship format).

Eliminates two types of variation, explicitly allow by spec (e.g. naming and layout of files).

Collection of samples used using criteria such as tumour type and exclusion lists from Disease Working Groups and Biospecimen Core Resource -> clustering group membership.

Collection of per-sample files merged together into a single file -> Firehose-hosted

https://confluence.broadinstitute.org/display/GDAC/fbget#fbget-configuration