mdozmorov / genome_runner

Academic Free License v3.0
0 stars 3 forks source link

wgEncodeAwgSegmentationSegway #79

Closed mdozmorov closed 9 years ago

mdozmorov commented 9 years ago

The chromatin segmentation states, downloaded from http://hgdownload-test.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeAwgSegmentation/, have huge files from "Segway" source. Each ~140Mb. GZIP seem to have a problem with them http://stackoverflow.com/questions/1732709/unzipping-part-of-a-gz-file-using-python

They are partially downloaded. But some have ".tmp" extension, the other have ".temp". -rw-rw-r-- 1 mdozmorov mdozmorov 135M Jul 15 17:54 wgEncodeAwgSegmentationSegwayGm12878.bed.gz.temp -rw-rw-r-- 1 mdozmorov mdozmorov 160M Jul 15 19:27 wgEncodeAwgSegmentationSegwayH1hesc.bed.gz.temp -rw-rw-r-- 1 mdozmorov mdozmorov 137M Jul 15 19:37 wgEncodeAwgSegmentationSegwayHelas3.bed.gz -rw-rw-r-- 1 mdozmorov mdozmorov 133M Jul 15 19:39 wgEncodeAwgSegmentationSegwayHepg2.bed.gz -rw-rw-r-- 1 mdozmorov mdozmorov 63M Jul 15 19:40 wgEncodeAwgSegmentationSegwayHepg2.bed.gz.tmp

dbcreator breaks with error, but continues on the next file: Downloading wgEncodeAwgSegmentationCombinedHepg2.bed.gz from UCSC Converting into proper bed format: wgEncodeAwgSegmentationCombinedHepg2.bed.gz Unable to convert wgEncodeAwgSegmentationCombinedHepg2.bed.gz into bed Traceback (most recent call last): File "/home/mdozmorov/genome_runner/grsnp/dbcreator_encode.py", line 592, in create_feature_set [minmax_score, gf_paths] = gf_grp_sett[gf_group]"prep_method" File "/home/mdozmorov/genome_runner/grsnp/dbcreator_encode.py", line 288, in preparebed_splitby line = infile.readline().rstrip('\n') File "/usr/lib/python2.7/gzip.py", line 462, in readline c = self.read(readsize) File "/usr/lib/python2.7/gzip.py", line 268, in read self._read(readsize) File "/usr/lib/python2.7/gzip.py", line 315, in _read self._read_eof() File "/usr/lib/python2.7/gzip.py", line 354, in _read_eof hex(self.crc))) IOError: CRC check failed 0xa3212669 != 0xe87de433L

Downloading wgEncodeAwgSegmentationCombinedHuvec.bed.gz from UCSC Downloading wgEncodeAwgSegmentationCombinedHuvec.bed.gz from UCSC Converting into proper bed format: wgEncodeAwgSegmentationCombinedHuvec.bed.gz Downloading wgEncodeAwgSegmentationCombinedK562.bed.gz from UCSC Downloading wgEncodeAwgSegmentationCombinedK562.bed.gz from UCSC Converting into proper bed format: wgEncodeAwgSegmentationCombinedK562.bed.gz Unable to convert wgEncodeAwgSegmentationCombinedK562.bed.gz into bed Traceback (most recent call last): File "/home/mdozmorov/genome_runner/grsnp/dbcreator_encode.py", line 592, in create_feature_set [minmax_score, gf_paths] = gf_grp_sett[gf_group]"prep_method" File "/home/mdozmorov/genome_runner/grsnp/dbcreator_encode.py", line 288, in preparebed_splitby line = infile.readline().rstrip('\n') File "/usr/lib/python2.7/gzip.py", line 462, in readline c = self.read(readsize) File "/usr/lib/python2.7/gzip.py", line 268, in read self._read(readsize) File "/usr/lib/python2.7/gzip.py", line 315, in _read self._read_eof() File "/usr/lib/python2.7/gzip.py", line 354, in _read_eof hex(self.crc))) IOError: CRC check failed 0x276f1616 != 0xa9403ceeL