If the user decodes the data elements in a dataset, string and Person Name
elements are converted to unicode. When the user attempts to write this dataset
to file, unicode encoding isn't handled explicitly so python attempts to
implicitly convert to ascii (or another encoding depending upon the user's
environment settings). The following code demonstrates this issue:
import dicom
from dicom.dataelem import DataElement
from dicom.charset import decode
from dicom.filewriter import write_string
data_element = DataElement((0x08,0x70),'SH','Suéver')
decode(data_element, None) # Decodes using default_encoding
write_string(open('/dev/null','wb'), data_element)
NOTE: The é character is not a valid character in ascii therefore write_string
will throw an ambiguous UnicodeEncodeError.
This handling of unicode needs to not only be present in
filewriter.write_string but also filewriter.write_PN. The handling of
PersonName elements should be different due to the fact that PN values use
multiple encodings.
I have already begun work on a patch for this, but just wanted to document the
issue here in case it crops up in the future.
Original issue reported on code.google.com by Suever@gmail.com on 24 Nov 2012 at 4:28
Original issue reported on code.google.com by
Suever@gmail.com
on 24 Nov 2012 at 4:28