ivoflipse / pydicom

Automatically exported from code.google.com/p/pydicom
0 stars 1 forks source link

crash because of encoding issues in logger messages #136

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Run the attached Python script which attempts to open a DICOM file with non 
ASCI characters in its name.
2. See python crash.

What is the expected output? What do you see instead?
I would expect pydicom to write an error message: the DICOM file we are trying 
to open does not exist.
Instead pydicom crashes while trying to write the error message to the logger.

What version of the product are you using?
0.9.8

Please provide any additional information below.
Reproduced on Fedora 19 with Python 2.7.5 and pydicom 0.9.8n but also on Ubuntu 
12.04 LTS.
Crash occurs with debug messages, not only error messages. I had initially seen 
the error in line 599 of filereader.py:
        logger.debug("Reading file '{0}'".format(fp))
The problem is that fp is a unicode string, while the error message is an 8-bit 
string. Changing the error message into an unicode string fixes the problem:
        logger.debug(u"Reading file '{0}'".format(fp))
While the bug is being fixed, any idea how to work around the problem?

Original issue reported on code.google.com by Dimitri....@gmail.com on 29 Nov 2013 at 12:20

Attachments:

GoogleCodeExporter commented 9 years ago
See Python issue 7300 'Unicode arguments in str.format()':
http://bugs.python.org/issue7300

See also blog "Python 2.X's str.format is unsafe":
http://blog.codekills.net/2011/09/22/python-2.x%27s-str.format-is-unsafe/

Either replace all occurences of str.format with %, or always use unicode 
format strings.

I'm still interested in possible workarounds while this is being fixed.

Original comment by Dimitri....@gmail.com on 29 Nov 2013 at 12:39

GoogleCodeExporter commented 9 years ago
For those interested, I got to work around the problem without patching pydicom 
using sys.setdefaultencoding('utf-8').

The setdefaultencoding() function is well hidden and its use is clearly 
discouraged:
  http://docs.python.org/2/library/sys.html#sys.setdefaultencoding
I was able to call this function using this hack found on different forums:
  import sys
  reload(sys)
  sys.setdefaultencoding("utf-8")
Applying that hack to my own scripts fixes the problem without patching pydicom.

By the way, I think was wrong about the plaforms I can reproduce this issue on: 
I can reproduce it on CentOS 6.4, not on Ubuntu 12.04 LTS and not so sure about 
Fedora 19 - I'd have to check again.

Anyway, I still believe this is a bug in pydicom, the above is ust a bad hack 
to work around the bug. Please fix the bug.

Original comment by Dimitri....@gmail.com on 29 Nov 2013 at 10:23

GoogleCodeExporter commented 9 years ago
Thanks for posting this issue and for digging into reasons and solutions. We'll 
take a good look at this and implement a fix. I'm thinking it likely won't be 
using the % string interpolation, as that was all changed over for python 3 
(although so far they have kept the % anyway).

Original comment by darcymason@gmail.com on 2 Dec 2013 at 12:49

GoogleCodeExporter commented 9 years ago
If you want a common code base for Python 2.x and 3.x, sticking to str.format() 
looks like the right decision. In that case you will have to avoid implicit 
conversions to ascii. Using unicode strings instead of 8-bit strings should do 
the trick. More specifically constant strings should be defined as :
    u"Reading file '{0}'"
instead of:
    "Reading file '{0}'"

I don't know if u"unicode string" is legal in Python 3.x.

Original comment by Dimitri....@gmail.com on 2 Dec 2013 at 1:50

GoogleCodeExporter commented 9 years ago
I looked into this, and I think using a unicode literal for the format 
specifier (where needed) seems the best solution. It is harmless for python 3 
as the 2to3 utility we use simply strips the u off the front (all strings are 
unicode in python 3).

There are only a few places where a unicode string (such as a filename) could 
be converted, and I've changed those in revision f1baa95b7604 .

Original comment by darcymason@gmail.com on 17 Jan 2014 at 2:34