sanand0 / xmljson

xmlsjon converts XML into Python dictionary structures (trees, like in JSON) and vice-versa.
MIT License
121 stars 33 forks source link

typing issue #3

Closed miverson closed 8 years ago

miverson commented 8 years ago

I'm having an issue where strings that appear to be numeric are being cast into integers.

For example:

<zip>01234</zip>

converts to:

"zip": 1234

This could be correctable with a simple test, shown below, or by passing an option to the constructor to bypass the _convert() method.

str(int('01234')) == '01234'

miverson commented 8 years ago

There's a second typing issue as well.

Starting XML:

<tag>true</tag>

Convert to python:

 { 'tag': True }

Back to xml:

<tag>True</tag>

Since 'True' and 'False' are not a valid values for type xs:boolean, this causes XML schema validation to fail.

I hacked an ugly solution by making the unicode function an ugly lambda:

unicode = lambda x: str(x) if type(x) is not bool else 'true' if x else 'false'
miverson commented 8 years ago

For the initial typing issue, I added a type_conv parameter to the constructor, and altered the _convert function to bypass the int and float conversions of not set. I also added hacks to preserve leading and trailing digits.

def _convert(self, value):
    'Convert string value to None, boolean, int or float'
    if not value:
        return None
    std_value = value.strip().lower()
    if std_value == 'true':
        return True
    elif std_value == 'false':
        return False
    if self.type_conv:
        try:
            if str(int(std_value)) == std_value:
                return int(std_value)
        except ValueError:
            pass
        try:
            if str(float(std_value)) == std_value:
                return float(std_value)
        except ValueError:
            pass
    return value

I can send a pull request if you want, but the changes are fairly simple, and probably too ugly for direct incorporation.

Thanks for making this library.

sanand0 commented 8 years ago

Thanks for spotting this @miverson. From v0.1.6, classes accept xml_fromstring= as an argument (4684b05). For example:

>>> from xmljson import Yahoo
>>> from xml.etree.ElementTree import fromstring
>>> from json import dumps
>>> yahoo = Yahoo(xml_fromstring=False)
>>> dumps(yahoo.data(fromstring('<zip>01234</zip>')))
'{"zip": "01234"}'

You can use a custom function. For example:

>>> yahoo = Yahoo(xml_fromstring=repr)      # custom function repr
>>> dumps(yahoo.data(fromstring('<zip>01234</zip>')))
'{"zip": "\'01234\'"}'

Also, by default, True gets converted to true.

>>> tostring(yahoo.etree({'tag': True}, root=Element('root')))
'<root tag="true" />'

But you can use xml_tostring= as a custom conversion function:

>>> yahoo = Yahoo(xml_tostring=repr)        # custom function repr
>>> tostring(yahoo.etree({'tag': True}, root=Element('root')))
'<root tag="True" />'

Hope this helps!

miverson commented 8 years ago

Thanks. I backed out my hacks and added the fix in today. Works perfectly.