Closed rbordelo closed 3 years ago
Please try telling the engine to only return nodes that already exist by changing for node_coordinate in processor.get_nodes(path):
to for node_coordinate in processor.get_nodes(path, mustexist=True):
. I expect this will not only resolve the issue for you but it will also ensure there are no inadvertent changes to the DOM. The default -- perhaps counterintuitively -- is mustexist=False
, which is most useful only when deliberately changing the DOM (which happened to be my own most-pressing use-case for these tools when I first started writing them).
From the Processor class documentation:
Keyword Parameters:
* mustexist (bool) Indicate whether yaml_path must exist
in data prior to this query (lest an Exception be raised);
default=False
Also from the release notes for 3.4.1:
Bug Fixes:
yaml-set (and the underlying Processor class) were unable to change nodes
having a null (None) value to anything else. This changes how null/None
values are handled by the Processor during node retrieval; they are no longer
discarded, so you will receive None as the data of any retrieved NodeCoords
for appropriate null/None leaf nodes.
Without setting mustexist=True
, the processor attempts to build any parts of the DOM necessary to match the given YAML Path, where possible. When it is not possible, an exception is thrown. I evidently didn't test deep traversal with null nodes because it looks like the new null node behavior is tricking the deep traversal code into an attempt to build an unknown DOM in its place.
While setting mustexist=True
is most likely what you need to be doing anyway -- since you don't appear to be deliberately change the DOM -- I will take a look at this anyway because a deep traversal shouldn't be attempting to build any DOM, no matter the setting of mustexist
.
Thanks!!
Thank you for your analysis. I've modified the test script by adding a mustexist=True
parameter to the processor.get_nodes()
method and changed the output to show the repr of each nodecoords
that is emitted by the generator returned by processor.get_nodes()
. Adding the mustexist=True
parameter did prevent a YAMLPathException
from being thrown (good), but now I'm seeing spurious nodecoords
being emitted for every node whose value is null, whether it matches the search criteria or not (strange.) Here's the output I am getting now (the second to last line of each trial output is the spurious noodecoords
):
$: # Trial 1: path='**/name'
$: ./test_script.py
repr(node_coordinate): NodeCoords('first thing', 'ordereddict([('name', 'first thing'), ('rank', 42)])', 'name')
repr(node_coordinate): NodeCoords('second thing', 'ordereddict([('name', 'second thing'), ('rank', 5)])', 'name')
repr(node_coordinate): NodeCoords('third thing', 'ordereddict([('name', 'third thing'), ('rank', None)])', 'name')
repr(node_coordinate): NodeCoords('None', 'ordereddict([('name', 'third thing'), ('rank', None)])', 'rank')
repr(node_coordinate): NodeCoords('fourth thing', 'ordereddict([('name', 'fourth thing'), ('rank', 1)])', 'name')
$:
$: # Trial 2: path='**/rank'
$: ./test_script.py
repr(node_coordinate): NodeCoords('42', 'ordereddict([('name', 'first thing'), ('rank', 42)])', 'rank')
repr(node_coordinate): NodeCoords('5', 'ordereddict([('name', 'second thing'), ('rank', 5)])', 'rank')
repr(node_coordinate): NodeCoords('None', 'ordereddict([('name', 'third thing'), ('rank', None)])', 'rank')
repr(node_coordinate): NodeCoords('None', 'ordereddict([('name', 'third thing'), ('rank', None)])', 'rank')
repr(node_coordinate): NodeCoords('1', 'ordereddict([('name', 'fourth thing'), ('rank', 1)])', 'rank')
I can work around this issue for now by skipping all nodecoords
elements emitted by the generator whose path attribute is None
.
That is unexpected! I'll add a new unit test to trap for this in addition to spending time with the deep traversal code to give it the clearly missing update it needed for the new null node handling.
Thank you for bringing this up!
This was a simple fix and will appear in the upcoming 3.5.0 release. In the traversal code, I'd mistakenly added the new null node handling code before checking whether a filter (more YAML Path after the ** segment) was present. Simply moving it after the check resolves the issue. I've added a new unit test to your credit just to ensure this never regresses.
Thanks again!
I have published version 3.5.0, which includes this fix.
I've installed yampath 3.5.0 and it's working well in my application, fixing the problems I saw in version 3.4.1. Thanks for the update.
Robert On Sun, 2021-04-25 at 18:33 -0700, William W. Kimball, Jr., MBA, MSIS wrote:
I have published version 3.5.0, which includes this fix. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
Operating System
Version of Python and packages in use at the time of the issue.
Minimum sample of YAML (or compatible) data necessary to trigger the issue
Complete steps to reproduce the issue when triggered via:
Load YAML document that has at least one leaf element explicitly set to null.
Preform a search using a YAML path that includes a deep traversal. The specified path does not necessarily have to match the leaf node(s) that are set to null. Example:
/**/name
or**.name
.Iterate over the returned results. An error is thrown once the first leaf node that is set to null is reached.
Command-Line Tools (yaml-get, yaml-set, or eyaml-rotate-keys): N/A
Libraries (yamlpath.*): Minimum amount of code necessary to trigger the defect.
Expected Outcome
The iteration of search results should complete successfully. If the leaf node(s) that were set to null matched the yamlpath search criteria, they should be returned with their values set to None. This was working as expected in yamlpath 3.4.0.
Actual Outcome
The following error is thrown:
yamlpath_bug.tar.gz
Screenshot(s), if Available