The cool new feature: parsing by element, allow to provide a depth

tefra / xsdata

Naive XML & JSON Bindings for python

https://xsdata.readthedocs.io

MIT License

314 stars 57 forks source link

The cool new feature: parsing by element, allow to provide a depth #537

Closed skinkie closed 3 years ago

skinkie commented 3 years ago

I would love to have the option to be able to provide a depth with the new element function. This would really allow to parse a subtree, event on the top level but not parse the entire document once you provide the top Element.

tefra commented 3 years ago

Can you provide an example/sample on how you imagine the api to work?

skinkie commented 3 years ago

Depth 0: only deserialize the base element's attributes. parser.parse(tree.find('.//{http://www.netex.org.uk/netex}versions'), VersionsRelStructure, depth=0)

Depth 1: only deserialize the base element's with its direct descendants. Hence, depthfirst is killed off after one level. parser.parse(tree.find('.//{http://www.netex.org.uk/netex}versions'), VersionsRelStructure, depth=1)

...and so on.

tefra commented 3 years ago

I don't see how this would fit the api to be honest. Since you have access to the ElementTree you can do a lot of these let's say selections manually before feeding them to the parser. I know its a bit extra work but it doesn't fit the parsers architecture that only need a source and optionally a type to bind everything together.

Despite the design aspect, its also a bit complicated to achieve this for both xml and json and all the different provided handlers.

skinkie commented 3 years ago

I have implemented something that might be able to do the trick.

from lxml import etree
from lxml.etree import Element
from collections import deque

def copyme(el):
    n = Element(el.tag, el.attrib)
    n.text = el.text
    n.tail = el.tail
    return n

def depthcopy(root, max_depth=1):
    queue = deque([(root, None, 0)])
    keep = None

    while queue:
      el, parent, depth = queue.popleft()
      if depth == max_depth:
        break

      new_parent = copyme(el)
      if parent is None:
        keep = new_parent
      else:
        parent.append(new_parent)

      # print(etree.tostring(keep))

      queue.extend([(x, new_parent, depth + 1) for x in el])
      # print(el.tag, depth)

    return keep

root = etree.XML('<root><a tag="hello"><b/><c/></a><d><e/></d></root>')
needle = root.find('.//a')
myfilter = depthcopy(needle, 1)
print(etree.tostring(myfilter))

So that would then be plugged into:

parser.parse(myfilter, RootStructure)

tefra commented 3 years ago

Glad to see you got it working, with the ElementTree you have a lot of options to cut/remove whatever you need, all that logic would have been impossible to integrate for all the handlers both lxml and native python.