stchris / untangle

Converts XML to Python objects
MIT License
613 stars 83 forks source link

Python3 incompatibility / unicode #17

Open robnardo opened 9 years ago

robnardo commented 9 years ago

Hi, i am using your library and receiving some errors when trying to run it using Python 3.4.0. I recently started working with python (so not an expert), but i was able to fix it for my needs by editing untagle.py on lines 143 and 149 and it worked.

So I changed line 143 to parser.parse(StringIO(filename.decode('utf-8'))) and line 149 to return string.startswith(b'http://') or string.startswith(b'https://')

stchris commented 9 years ago

Hello and thanks for reporting this issue. I wasn't enable to reproduce it yet, but I've enabled the automatic tests to run for Python 3.4 as well. It would help a lot if you could tell me more about how you hit this issue. Can you maybe post the filename, or parts of it, so I can try to write a test which fails?

mplewis commented 8 years ago

I am having this issue with the following code:

# etree is of type <class 'xml.etree.ElementTree.Element'>
class Page:
    def __init__(self, etree):
        self.etree = etree
        self.untangled = untangle.parse(ET.tostring(etree))

Traceback:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-26-05be7dcdfcdc> in <module>()
     21 
     22 for child in root:
---> 23     print(parse_to_obj(child))

<ipython-input-26-05be7dcdfcdc> in parse_to_obj(etree)
      9         return File(etree)
     10     else:
---> 11         return Page(etree)
     12 
     13 class Page:

<ipython-input-26-05be7dcdfcdc> in __init__(self, etree)
     14     def __init__(self, etree):
     15         self.etree = etree
---> 16         self.untangled = untangle.parse(ET.tostring(etree))
     17 
     18 class File:

/Users/mplewis/.pyenv/versions/3.5.0/lib/python3.5/site-packages/untangle.py in parse(filename)
    138     sax_handler = Handler()
    139     parser.setContentHandler(sax_handler)
--> 140     if os.path.exists(filename) or is_url(filename):
    141         parser.parse(filename)
    142     else:

/Users/mplewis/.pyenv/versions/3.5.0/lib/python3.5/site-packages/untangle.py in is_url(string)
    147 
    148 def is_url(string):
--> 149     return string.startswith('http://') or string.startswith('https://')
    150 
    151 # vim: set expandtab ts=4 sw=4:

TypeError: startswith first arg must be bytes or a tuple of bytes, not str
stchris commented 8 years ago

I'll try to have a look at this. @mplewis could you also maybe post the xml you're parsing against?

rhaamo commented 8 years ago

I've needed to do the same as @robnardo , in my case I do something like:

a=requests.get("http://whatever_returns_an_xml/")
b=untangle.parse(a.text)

The XML returned contains sometimes unicode like Francés and without editing anything it explodes on cannot encode unicode crap. If I do untangle.parse(a.text.encode('UTF-8')) it will explodes like:

  File "/usr/local/lib/python3.4/dist-packages/untangle.py", line 149, in is_url
    return string.startswith('http://') or string.startswith('https://')
TypeError: <flask_script.commands.Command object at 0x7f70316c34e0>: startswith first arg must be bytes or a tuple of bytes, not str

So using robnardo's edit it works as expected.

ps: I use requests and not untangle's one as I need to edit some headers before sending the request

stchris commented 7 years ago

Can you test this again with the newly released version 1.1.1 ?

lolouk44 commented 6 years ago

just wanted to state I had the same issue under Python 3.5 (python 2.7 worked ok) Doing the same changes as @robnardo fixed the issue for me too

stchris commented 2 years ago

I added one more test in #89 but wasn't able to reproduce this. Would appreciate a concrete failing test.