moloney / dcmstack

DICOM to Nifti conversion with meta data preservation
Other
72 stars 51 forks source link

non utf-8 strings #8

Closed bpinsard closed 11 years ago

bpinsard commented 11 years ago

tag values not utf8 encoded (seems to be possible in data from Siemens scanner) causes crash in json serialization:

dcmmeta.pyc in _mangle(self, value)
    698     def _mangle(self, value):
    699         '''Go from runtime representation to extension data.'''
--> 700         return json.dumps(value, indent=4)
.
.
.
/usr/lib64/python2.6/json/encoder.pyc in _iterencode(self, o, markers)
    292                     and not (_encoding == 'utf-8')):
    293                 o = o.decode(_encoding)
--> 294             yield encoder(o)
    295         elif o is None:
    296             yield 'null'
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe2 in position 2: invalid continuation byte

is there a way to force conversion ? or remove the tags that cannot convert to utf-8

moloney commented 11 years ago

I currently have an "ignore_rule" to skip elements with a VR of 'OW', 'OB', or 'UN' if the value contains non-ASCII characters. However the ignore rules are not applied to the results from translators.

Can you confirm that the bad data is coming from the CsaImage or CsaSeries translators?

bpinsard commented 11 years ago

I see, the bad data is not in csa, but the VR is LO so it is not handled in the ignore rule.

moloney commented 11 years ago

Intersesting. I don't think it is valid to have non utf8 characters in a value with a VR of LO. Any idea what the value is? Was the file possibly corrupted?

bpinsard commented 11 years ago

I am not a Dicom expert and it is maybe not valid but it seems that the scanner/export/other allowed non-ascii characters(accent in French) to be entered, the field is RequestedProcedureDescription field.

moloney commented 11 years ago

I believe non-ASCII in an LO value is fine, but not non-utf8. The french accent should be valid utf8.

Is there some way you could send me the data?

moloney commented 11 years ago

This should address the issue.