suever / pydicom-experimental

pydicom test
0 stars 1 forks source link

Conversion of raw unicode values in Dataset initialization fails (IronPython) #115

Open suever opened 9 years ago

suever commented 9 years ago

From j...@computer.org on April 23, 2012 12:02:51

What steps will reproduce the problem? 1. Use following Python code and attached dataset to reproduce

    f1 = 'unicodeST.dcm'
    ds1 = dicom.read_file(f1, stop_before_pixels=True)
    string = str(ds1)

An exception occurs when mapping raw data to dict leaving the Dataset in an incomplete and invalid state. What version of the product are you using? Pydicom 0.9.7 IronPython 2.7 _NOTE_: any text or attached files posted with the issue can be viewed by anyone. You are solely responsible to ensure that they contain no confidential information of any kind. Please provide any additional information below. Issue results from raw unicode data being typed as, 'str'. Then the conversion to str fails with an unhandled exception.

I don't have patch tool. below is modified valuerep.py:MultiString implementation to resolve issue.

def MultiString(val, valtype=str): """Split a string by delimiters if there are any

val -- DICOM string to split up
valtype -- default str, but can be e.g. UID to overwrite to a specific type
"""
# Remove trailing blank used to pad to even length
#2005.05.25: also check for trailing 0, error made in PET files we are converting
if val and (val.endswith(' ') or val.endswith('\x00')):
    val = val[:-1]

# XXX --> simpler version python > 2.4   splitup = [valtype(x) if x else x for x in val.split("\\")]
splitup = []
for subval in val.split("\\"):
    if subval:
        if isinstance(val, unicode):
            splitup.append(unicode(subval))
        else:
            splitup.append(valtype(subval))
    else:
        splitup.append(subval)
if len(splitup) == 1:
    return splitup[0]
else:
    return MultiValue(valtype, splitup)

Attachment: unicodeST.dcm

Original issue: http://code.google.com/p/pydicom/issues/detail?id=115

suever commented 9 years ago

From darcymason@gmail.com on April 23, 2012 17:56:42

Thanks for the issue and sample file. However, on python 2.7, I don't see this error after running the steps to generate the problem. In regular python, val should never be unicode. A quick search shows that (at least some time ago) IronPython did everything in unicode, so the error is probably unique to IronPython.

Hmmm... so what to do... I'm not convinced this is the only place where unicode would cause a problem. Even in the python 3 version of pydicom, I expect val will come into MultiString as bytes (but probably will be converted inside).

I'll think about this. Meanwhile, the code fix above may not be quite right -- it doesn't check valtype before dealing with the unicode. As noted in the comments, valtype could be something like UID. Is it str(val) that is causing the problem when val is unicode? If so, then a solution that should work would be just to change valtype from str if necessary. This check after the trailing blank part should work:

if isinstance(val, unicode) and valtype == str: valtype = unicode # or even a null function that does nothing should work

That would also work with the new list comprehension line for python >2.4.

Summary: Conversion of raw unicode values in Dataset initialization fails (IronPython)
Status: Accepted