Closed skinkie closed 3 years ago
Can you provide an example/sample on how you imagine the api to work?
Depth 0: only deserialize the base element's attributes. parser.parse(tree.find('.//{http://www.netex.org.uk/netex}versions'), VersionsRelStructure, depth=0)
Depth 1: only deserialize the base element's with its direct descendants. Hence, depthfirst is killed off after one level. parser.parse(tree.find('.//{http://www.netex.org.uk/netex}versions'), VersionsRelStructure, depth=1)
...and so on.
I don't see how this would fit the api to be honest. Since you have access to the ElementTree you can do a lot of these let's say selections manually before feeding them to the parser. I know its a bit extra work but it doesn't fit the parsers architecture that only need a source and optionally a type to bind everything together.
Despite the design aspect, its also a bit complicated to achieve this for both xml and json and all the different provided handlers.
I have implemented something that might be able to do the trick.
from lxml import etree
from lxml.etree import Element
from collections import deque
def copyme(el):
n = Element(el.tag, el.attrib)
n.text = el.text
n.tail = el.tail
return n
def depthcopy(root, max_depth=1):
queue = deque([(root, None, 0)])
keep = None
while queue:
el, parent, depth = queue.popleft()
if depth == max_depth:
break
new_parent = copyme(el)
if parent is None:
keep = new_parent
else:
parent.append(new_parent)
# print(etree.tostring(keep))
queue.extend([(x, new_parent, depth + 1) for x in el])
# print(el.tag, depth)
return keep
root = etree.XML('<root><a tag="hello"><b/><c/></a><d><e/></d></root>')
needle = root.find('.//a')
myfilter = depthcopy(needle, 1)
print(etree.tostring(myfilter))
So that would then be plugged into:
parser.parse(myfilter, RootStructure)
Glad to see you got it working, with the ElementTree you have a lot of options to cut/remove whatever you need, all that logic would have been impossible to integrate for all the handlers both lxml and native python.
I would love to have the option to be able to provide a depth with the new element function. This would really allow to parse a subtree, event on the top level but not parse the entire document once you provide the top Element.