mdozmorov / genome_runner

Academic Free License v3.0
0 stars 3 forks source link

additional ENCODE/chromStates #77

Closed mdozmorov closed 9 years ago

mdozmorov commented 9 years ago

Now, we process http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeBroadHmm/ into "ENCODE/chromStates/Tier1/Cell/" hierarchy. The data are Tier1 and Tier2 only.

There are other chromatin segmentation data, http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeAwgSegmentation/. It contains 3 types of data, "Chromhmm/Combined/Segway". The data are Tier1 only.

The "BroadHmm/ChromHmm/Combined/Segway" should be split by 4th column, used as "factor". Note that the "Segway" files are much larger that the others, but have good data.

Suggesting combining them. These types may be used instead of tiers. E.g.: ENCODE chromStates __BroadHmm (use as "source" in the file name) __Celltype (Tier 1 and 2, nine cells total) __ChromHmm (use as "source" in the file name) _Celltype (they have Tier 1 only) __Combined (use as "source" in the file name) _Celltype __Segway (use as "source" in the file name) _____Celltype

mdozmorov commented 9 years ago

ROADMAP _chromStates15 (use as "source" in the file name) ___Group ____EID (use as "celltype" in the file name) ___files