Open jakajancar opened 1 year ago
TL;DR Just omit the dash at the end.
Hi. Your example works as expected. It seems in your case the JSON Pointer (pointer
option) is just not used correctly. The pointer
option means "iterate over items in this element". If you only need to iterate over the items in the user-provided-parameters
key, just use /user-provided-parameters
as the pointer. The dash at the end means "any index" so it matches /user-provided-parameters/0
, /user-provided-parameters/1
, and so on, and then tries to iterate over what's inside a vector on that index. If you need more explanation, let me know or have a second look at the JSON Machine documentation.
Thanks for the quick response! You're right.
I tried to reduce the case and did it incorrectly. Let me try again:
Let's say we have a number[][]
matrix where we want to iterate through cells, same as:
function cells($matrix) {
foreach ($matrix as $row) {
foreach ($row as $cell) {
yield $cell;
}
}
}
$options = ['pointer' => '/table/-'];
Items::fromString('{"table": [[1,2], [3,4]]}', $options);
// Expected: [1,2,3,4]
// Actual: same
Items::fromString('{"table": [[1,2], 3]}', $options);
// Expected: error
// Actual: [1,2,3]
Is this possible?
And the reason I was using /table/-/-
was because then you get nice results in getCurrentJsonPointer()
:
1 - /table/0/0
2 - /table/0/1
3 - /table/1/0
4 - /table/1/1
What are your thoughts on an option "flatten" => false
(default true), where of your examples:
JSON Pointer value | Will iterate through |
---|---|
(empty string - default) | ["this", "array"] or {"a": "this", "b": "object"} will be iterated (main level) |
/result/items |
{"result": {"items": ["this", "array", "will", "be", "iterated"]}} |
/0/items |
[{"items": ["this", "array", "will", "be", "iterated"]}] (supports array indices) |
/results/-/status |
{"results": [{"status": "iterated"}, {"status": "also iterated"}]} (a hyphen as an array index wildcard) |
/ (gotcha! - a slash followed by an empty string, see the spec) |
{"":["this","array","will","be","iterated"]} |
/quotes\" |
{"quotes\"": ["this", "array", "will", "be", "iterated"]} |
All of them return a single item, except /results/-/status
(with an explicit wildcard) returns the same as today?
I'm not sure what the question is now. Can you be more specific?
Anyway, let me just elaborate a little on the flatten
topic. JSON Machine supports finding data in a JSON down to a single scalar value if needed. It does that automatically. If it finds a scalar value at a pointer instead of an object or an array, it just yields it in a single iteration. So it might seem it somehow flattens the structure when used in combination with -
and when the structure is not rigid. But in reality, no such thing happens.
Try this and you'll see no deep flattening is happening:
$options = ['pointer' => '/table/-'];
Items::fromString('{"table": [[[1,2]], [3,4]]}', $options);
// Expected: [[1,2],3,4]
Also, this example is not expected to produce an error:
$options = ['pointer' => '/table/-'];
Items::fromString('{"table": [[1,2], 3]}', $options);
because at /table/0
there is [1,2]
which is sequentially iterated, and at /table/1
there is 3
which is a scalar value and as such it's simply yielded as a single value.
I would expect a behavior where:
getCurrentJsonPointer()
.Currently, even a non-wildcard component explodes the items (but has nowhere to indicate this in the path), if the element pointed to is an object/array. It is this behavior that I would like to have a way to disable.
Below is (yet another) example, which demonstrates both my concerns (indexes in getCurrentJsonPointer()
and unpredictable levels).
Say you have two-level array mixed[][]
, where all of these are valid:
{"2d": [[1,2], [3]]}
$value['2d'][0][0] (/2d/0/0) = 1
$value['2d'][0][1] (/2d/0/1) = 2
$value['2d'][1][0] (/2d/1/0) = 3
{"2d": [[1,2], [3,true]]}
$value['2d'][0][0] (/2d/0/0) = 1
$value['2d'][0][1] (/2d/0/1) = 2
$value['2d'][1][0] (/2d/1/0) = 3
$value['2d'][1][1] (/2d/1/1) = true
{"2d": [[1,2], [3,[4,5]]]}
$value['2d'][0][0] (/2d/0/0) = 1
$value['2d'][0][1] (/2d/0/1) = 2
$value['2d'][1][0] (/2d/1/0) = 3
$value['2d'][1][1] (/2d/1/1) = [4,5]
The following is not valid, because it's not really mixed[][]
:
{"2d": [[1,2], false]}
$value['2d'][0][0] (/2d/0/0) = 1
$value['2d'][0][1] (/2d/0/1) = 2
$value['2d'][1][0] = error
I would like to
This cannot be currently achieved:
/2d/-/-
[[1,2], [3,[4,5]]]
) gets flattened (and you get 5 items)/2d/-
:
If you use /2d/-/-
If you use /2d/-
:
find
command to fail on every existing file in the searched dir that does not match searched string.Sorry for being brief ;)
No worries, I appreciate your responses, responsiveness, and patience with me iterating on trying to get the best example.
If you use
/2d/-/-
❌ Third valid example ([[1,2], [3,[4,5]]]) gets flattened (and you get 5 items)
That's a feature, not a bug as explained earlier.
Yes, I understand. But disabling this feature is essentially my feature request! :D
If you use
/2d/-
:
❌ You do not get both indices, only the first.
Ok, this seems weird. Can you give the exact output? Could it be the same problem as Why only red is output #100?
I'm not saying that the items do not get iterated over, just that in the getCurrentJsonPointer()
return value you don't have both indices (which makes sense, since there is not "placeholder" for them).
❌ The invalid example gets silently ignored (you get same items as first valid example)
- Not-found items get ignored. That's normal behavior. It's as if you wanted the
find
command to fail on every existing file in the searched dir that does not match searched string.
By "silently ignored" I don't mean not returned by the iterator (that's what happens with /2d/-/-
and that's OK) but returned identically than if it was in a different structure.
Perhaps I owe an explanation for this admittedly weird use-case:
I'm querying OpenAI's text completions AI with the new function calling/structured output mechanism, which returns JSON. JSON Machine is used to return results in a streaming fashion to the user live (see videos here if curious). That table should be string[][]
and 95% of the time it is, but occasionally the model hallucinates and omits a level of nesting, adds a level of nesting, returns the wrong number of rows or cells. So when iterating over /2d/-/-
I check both the indexes to be monotonically increasing with no gaps, that the values are indeed string, and so on... very defensively.
In recap, I don't think path nr# 2 (/2d/-
) is the way forward. /2d/-/-
is mostly there, but I would prefer not to have that auto-descent feature.
But disabling this feature is essentially my feature request! :D
Now it makes perfect sense 😁. Because in terms of JSON Machine, there's no 'flattening', I'd suggest modifying the scalar parsing logic, which is what's actually behind your problem. Maybe an option something like iterate_scalars
, with three settings:
AUTO
(current behavior, would remain the default)ALWAYS
/ONLY
/FORCE
(an iterable on the pointer position will throw)NEVER
(a scalar on the pointer position will throw)This example of yours:
$options = ['pointer' => '/table/-'];
Items::fromString('{"table": [[1,2], 3]}', $options); // Expected: error // Actual: [1,2,3]
would then throw an error with option 'iterate_scalars' => NEVER
Also for a less predictable structure maybe #36 would help?
What do you think about the solution proposed above? (iterate_scalars
option as a feature request)?
Let's say we have a property with an array of dynamically-typed, user-provided parameters.
This makes JSON Machine not very useful for working with documents with a more dynamic schema. Moreover, even arrays have a special case at length == 0.
I checked the JSON Pointer spec to see if this is an implementation bug or by design. Seems like JSON Pointer is not intended for the (JSONPath-like) selection at all, but for navigation to a single node. Even the
-
is interpreted differently (the (nonexistent) member after the last array element
vsa wildcard which matches any array index
). It also would not have the above problem and would always navigate to the expected subtree. It would be better if the readme said "a syntax inspired by JSON Pointer".Re. a solution, it would be great if there was an option to not automatically descend deeper than the specified path and make the subtree selection not dependent on the values in it.