suever / pydicom-experimental

pydicom test
0 stars 1 forks source link

crash because of encoding issues in logger messages #136

Closed suever closed 9 years ago

suever commented 9 years ago

From Dimitri....@gmail.com on November 28, 2013 19:20:11

What steps will reproduce the problem? 1. Run the attached Python script which attempts to open a DICOM file with non ASCI characters in its name.

  1. See python crash. What is the expected output? What do you see instead? I would expect pydicom to write an error message: the DICOM file we are trying to open does not exist. Instead pydicom crashes while trying to write the error message to the logger. What version of the product are you using? 0.9.8 Please provide any additional information below. Reproduced on Fedora 19 with Python 2.7.5 and pydicom 0.9.8n but also on Ubuntu 12.04 LTS. Crash occurs with debug messages, not only error messages. I had initially seen the error in line 599 of filereader.py: logger.debug("Reading file '{0}'".format(fp)) The problem is that fp is a unicode string, while the error message is an 8-bit string. Changing the error message into an unicode string fixes the problem: logger.debug(u"Reading file '{0}'".format(fp)) While the bug is being fixed, any idea how to work around the problem?

Attachment: toto.py

Original issue: http://code.google.com/p/pydicom/issues/detail?id=136

suever commented 9 years ago

From Dimitri....@gmail.com on November 28, 2013 16:39:00

See Python issue 7300 'Unicode arguments in str.format()': http://bugs.python.org/issue7300 See also blog "Python 2.X's str.format is unsafe": http://blog.codekills.net/2011/09/22/python-2.x%27s-str.format-is-unsafe/ Either replace all occurences of str.format with %, or always use unicode format strings.

I'm still interested in possible workarounds while this is being fixed.

suever commented 9 years ago

From Dimitri....@gmail.com on November 29, 2013 02:23:31

For those interested, I got to work around the problem without patching pydicom using sys.setdefaultencoding('utf-8').

The setdefaultencoding() function is well hidden and its use is clearly discouraged: http://docs.python.org/2/library/sys.html#sys.setdefaultencoding I was able to call this function using this hack found on different forums: import sys reload(sys) sys.setdefaultencoding("utf-8") Applying that hack to my own scripts fixes the problem without patching pydicom.

By the way, I think was wrong about the plaforms I can reproduce this issue on: I can reproduce it on CentOS 6.4, not on Ubuntu 12.04 LTS and not so sure about Fedora 19 - I'd have to check again.

Anyway, I still believe this is a bug in pydicom, the above is ust a bad hack to work around the bug. Please fix the bug.

suever commented 9 years ago

From darcymason@gmail.com on December 01, 2013 16:49:44

Thanks for posting this issue and for digging into reasons and solutions. We'll take a good look at this and implement a fix. I'm thinking it likely won't be using the % string interpolation, as that was all changed over for python 3 (although so far they have kept the % anyway).

Status: Accepted
Labels: -Difficulty-Medium Difficulty-Easy

suever commented 9 years ago

From Dimitri....@gmail.com on December 02, 2013 05:50:45

If you want a common code base for Python 2.x and 3.x, sticking to str.format() looks like the right decision. In that case you will have to avoid implicit conversions to ascii. Using unicode strings instead of 8-bit strings should do the trick. More specifically constant strings should be defined as : u"Reading file '{0}'" instead of: "Reading file '{0}'"

I don't know if u"unicode string" is legal in Python 3.x.

suever commented 9 years ago

From darcymason@gmail.com on January 16, 2014 18:34:55

I looked into this, and I think using a unicode literal for the format specifier (where needed) seems the best solution. It is harmless for python 3 as the 2to3 utility we use simply strips the u off the front (all strings are unicode in python 3).

There are only a few places where a unicode string (such as a filename) could be converted, and I've changed those in revision f1baa95b7604 .

Status: Verified