moloney / dcmstack

DICOM to Nifti conversion with meta data preservation
Other
72 stars 51 forks source link

test_get_elem_value UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 5136: character maps to <undefined> #56

Closed yarikoptic closed 6 years ago

yarikoptic commented 6 years ago
$> apt-cache policy python-chardet 
python-chardet:
  Installed: 3.0.4-1
  Candidate: 3.0.4-1
  Version table:
 *** 3.0.4-1 600
        600 http://http.debian.net/debian sid/main amd64 Packages
        600 http://http.debian.net/debian sid/main i386 Packages
        100 /var/lib/dpkg/status
     2.3.0-2 100
        100 http://http.debian.net/debian stretch/main amd64 Packages
        100 http://http.debian.net/debian stretch/main i386 Packages

$> nosetests -s -v test/test_extract.py:TestMetaExtractor.test_get_elem_value
test_extract.TestMetaExtractor.test_get_elem_value ... ERROR

======================================================================
ERROR: test_extract.TestMetaExtractor.test_get_elem_value
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/yoh/deb/gits/pkg-exppsy/dcmstack/test/test_extract.py", line 91, in test_get_elem_value
    value = extractor._get_elem_value(elem)
  File "/home/yoh/deb/gits/pkg-exppsy/dcmstack/src/dcmstack/extract.py", line 398, in _get_elem_value
    value = self.conversions[elem.VR](value)
  File "/home/yoh/deb/gits/pkg-exppsy/dcmstack/src/dcmstack/extract.py", line 295, in get_text
    return byte_str.decode(match['encoding'])
  File "/usr/lib/python2.7/encodings/cp1254.py", line 15, in decode
    return codecs.charmap_decode(input,errors,decoding_table)
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 5136: character maps to <undefined>
-------------------- >> begin captured logging << --------------------
chardet.charsetprober: DEBUG: SHIFT_JIS Japanese prober hit error at byte 240
chardet.charsetprober: DEBUG: EUC-JP Japanese prober hit error at byte 137
chardet.charsetprober: DEBUG: GB2312 Chinese prober hit error at byte 137
chardet.charsetprober: DEBUG: EUC-KR Korean prober hit error at byte 137
chardet.charsetprober: DEBUG: CP949 Korean prober hit error at byte 137
chardet.charsetprober: DEBUG: Big5 Chinese prober hit error at byte 137
chardet.charsetprober: DEBUG: EUC-TW Taiwan prober hit error at byte 137
chardet.charsetprober: DEBUG: windows-1251 Russian confidence = 0.0
chardet.charsetprober: DEBUG: KOI8-R Russian confidence = 0.01
chardet.charsetprober: DEBUG: ISO-8859-5 Russian confidence = 0.01
chardet.charsetprober: DEBUG: MacCyrillic Russian confidence = 0.01
chardet.charsetprober: DEBUG: IBM866 Russian confidence = 0.01
chardet.charsetprober: DEBUG: IBM855 Russian confidence = 0.01
chardet.charsetprober: DEBUG: ISO-8859-7 Greek confidence = 0.01
chardet.charsetprober: DEBUG: windows-1253 Greek confidence = 0.01
chardet.charsetprober: DEBUG: ISO-8859-5 Bulgairan confidence = 0.01
chardet.charsetprober: DEBUG: windows-1251 Bulgarian confidence = 0.0
chardet.charsetprober: DEBUG: TIS-620 Thai confidence = 0.01
chardet.charsetprober: DEBUG: ISO-8859-9 Turkish confidence = 0.355306617892
chardet.charsetprober: DEBUG: windows-1255 Hebrew confidence = 0.0
chardet.charsetprober: DEBUG: windows-1255 Hebrew confidence = 0.01
chardet.charsetprober: DEBUG: windows-1255 Hebrew confidence = 0.01
chardet.charsetprober: DEBUG: windows-1251 Russian confidence = 0.0
chardet.charsetprober: DEBUG: KOI8-R Russian confidence = 0.01
chardet.charsetprober: DEBUG: ISO-8859-5 Russian confidence = 0.01
chardet.charsetprober: DEBUG: MacCyrillic Russian confidence = 0.01
chardet.charsetprober: DEBUG: IBM866 Russian confidence = 0.01
chardet.charsetprober: DEBUG: IBM855 Russian confidence = 0.01
chardet.charsetprober: DEBUG: ISO-8859-7 Greek confidence = 0.01
chardet.charsetprober: DEBUG: windows-1253 Greek confidence = 0.01
chardet.charsetprober: DEBUG: ISO-8859-5 Bulgairan confidence = 0.01
chardet.charsetprober: DEBUG: windows-1251 Bulgarian confidence = 0.0
chardet.charsetprober: DEBUG: TIS-620 Thai confidence = 0.01
chardet.charsetprober: DEBUG: ISO-8859-9 Turkish confidence = 0.355306617892
chardet.charsetprober: DEBUG: windows-1255 Hebrew confidence = 0.0
chardet.charsetprober: DEBUG: windows-1255 Hebrew confidence = 0.01
chardet.charsetprober: DEBUG: windows-1255 Hebrew confidence = 0.01
--------------------- >> end captured logging << ---------------------