Closed GoogleCodeExporter closed 9 years ago
See Python issue 7300 'Unicode arguments in str.format()':
http://bugs.python.org/issue7300
See also blog "Python 2.X's str.format is unsafe":
http://blog.codekills.net/2011/09/22/python-2.x%27s-str.format-is-unsafe/
Either replace all occurences of str.format with %, or always use unicode
format strings.
I'm still interested in possible workarounds while this is being fixed.
Original comment by Dimitri....@gmail.com
on 29 Nov 2013 at 12:39
For those interested, I got to work around the problem without patching pydicom
using sys.setdefaultencoding('utf-8').
The setdefaultencoding() function is well hidden and its use is clearly
discouraged:
http://docs.python.org/2/library/sys.html#sys.setdefaultencoding
I was able to call this function using this hack found on different forums:
import sys
reload(sys)
sys.setdefaultencoding("utf-8")
Applying that hack to my own scripts fixes the problem without patching pydicom.
By the way, I think was wrong about the plaforms I can reproduce this issue on:
I can reproduce it on CentOS 6.4, not on Ubuntu 12.04 LTS and not so sure about
Fedora 19 - I'd have to check again.
Anyway, I still believe this is a bug in pydicom, the above is ust a bad hack
to work around the bug. Please fix the bug.
Original comment by Dimitri....@gmail.com
on 29 Nov 2013 at 10:23
Thanks for posting this issue and for digging into reasons and solutions. We'll
take a good look at this and implement a fix. I'm thinking it likely won't be
using the % string interpolation, as that was all changed over for python 3
(although so far they have kept the % anyway).
Original comment by darcymason@gmail.com
on 2 Dec 2013 at 12:49
If you want a common code base for Python 2.x and 3.x, sticking to str.format()
looks like the right decision. In that case you will have to avoid implicit
conversions to ascii. Using unicode strings instead of 8-bit strings should do
the trick. More specifically constant strings should be defined as :
u"Reading file '{0}'"
instead of:
"Reading file '{0}'"
I don't know if u"unicode string" is legal in Python 3.x.
Original comment by Dimitri....@gmail.com
on 2 Dec 2013 at 1:50
I looked into this, and I think using a unicode literal for the format
specifier (where needed) seems the best solution. It is harmless for python 3
as the 2to3 utility we use simply strips the u off the front (all strings are
unicode in python 3).
There are only a few places where a unicode string (such as a filename) could
be converted, and I've changed those in revision f1baa95b7604 .
Original comment by darcymason@gmail.com
on 17 Jan 2014 at 2:34
Original issue reported on code.google.com by
Dimitri....@gmail.com
on 29 Nov 2013 at 12:20Attachments: