Unicode nodes in markup writer

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. Add a node with a Unicode string as node name (i.e. add_node(u"mañana")).
2. Try to write the graph via pygraph.readwrite.markup.write() 

You get an UnicodeEncodeError. For example:

Traceback (most recent call last):
  File "bin/translations_spanish_2.py", line 93, in <module>
    main(sys.argv)
  File "bin/translations_spanish_2.py", line 89, in main
    output.write(write(gr))
  File "/usr/local/lib/python2.6/dist-packages/python_graph_core-1.8.0-py2.6.egg/pygraph/readwrite/markup.py", line 66, in write
    node.setAttribute('id', str(each_node))
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0301' in position 
8: ordinal not in range(128)

When I change all calls of "str()" in markup.py to "unicode()" the error does 
not appear anymore. I suggest using calls to "unicode()" instead of strings.

This does only happen under Python 2.x, since Python 3 uses Unicode 
automaticall when you call "str()".

Original issue reported on code.google.com by pebo...@gmail.com on 26 Jul 2011 at 10:51

GoogleCodeExporter commented 9 years ago

If you are using python 2.x you could encode your unicode strings before 
passing them to add_node method:

e.g. G.add_node(unicode('€€€','utf8').encode('utf8'))

Original comment by tomaz.ko...@gmail.com on 26 Jul 2011 at 4:07

GoogleCodeExporter commented 9 years ago

Yes, that works. But unfortunately not for the dot writer. And it is not 
possible to read in dot or markup files written like that.

Original comment by pebo...@gmail.com on 3 Aug 2011 at 1:31

GoogleCodeExporter commented 9 years ago

Hello. Have you tried specifying the file encoding?

The following works fine (but fails if the comment line is removed):

#! -*- coding: utf-8 -*-

from pygraph.classes.graph import graph
from pygraph.readwrite.markup import write, read

gr = graph()
gr.add_node("aló")
print gr
xml = write(gr)
print xml
print read(xml)

Original comment by pmatiello on 3 Aug 2011 at 10:41

GoogleCodeExporter commented 9 years ago

Yes, that works. I already specified the encoding in my scripts. As I wrote in 
my ticket, I have Unicode strings, i.e. gr.add_node(u"aló") does not work. 
What is the difference here? Do I really have to encode everything as UTF-8 
before passing it to the graph?

Original comment by pebo...@gmail.com on 4 Aug 2011 at 9:28

GoogleCodeExporter commented 9 years ago

If we use the unicode() call in markup.py, strings using characters outside the 
standard 127-bits ascii range must be formed as unicode literals. The following 
would program fail:

gr = graph()
gr.add_node("olá")
xml = write(gr)

To make it work, the second line would have to be:

gr.add_node(u"olá")

And this would be a regression.

Original comment by pmatiello on 6 Aug 2011 at 1:41

GoogleCodeExporter commented 9 years ago

Original comment by pmatiello on 6 Aug 2011 at 1:42

Changed state: WontFix

wting / python-graph

Unicode nodes in markup writer #95