sanand0 / xmljson

xmlsjon converts XML into Python dictionary structures (trees, like in JSON) and vice-versa.
MIT License
121 stars 33 forks source link

Allow the dropping of tags that contain illegal characters in the XML spec #33

Closed Zurga closed 5 years ago

Zurga commented 5 years ago

If the data that is being parsed contains characters which are illegal in tag names (as specified here: https://www.w3schools.com/XML/xml_elements.asp), lxml will raise a ValueError. This is fine and expected if you control the data being sent to xmljson. In other cases you might want to be able to suppress the error and just silently drop the tags that are not allowed. I have added this by adding a "drop_invalid_tags" keyword argument to the "etree" methods.

sanand0 commented 5 years ago

@Zurga I've created an alternate implementation in the chars2 branch and on d68bdee

I've made 2 changes:

  1. Moved the setting to the constructor instead of the etree method
  2. Changing drop_invalid_tags=True to invalid_tags='drop' -- allowing alternate strategies in the future (such as invalid_tags=some_replacement_function)

Does this work for you?

Zurga commented 5 years ago

@sanand0, Yeah that works for me, nice solution! Mine was a little bit clobbered.