hiltontj / serde_json_path

Query serde_json Values with JSONPath
https://serdejsonpath.live/
MIT License
50 stars 3 forks source link

Ability to get path of queried nodes #76

Closed FruitieX closed 7 months ago

FruitieX commented 8 months ago

Hi,

I have a use-case where I would need to get the path (in some format) and value of queried nodes, this doesn't seem possible at the moment. How much work do you think this would require? I might take a stab at this eventually. 😄

FruitieX commented 8 months ago

https://github.com/hiltontj/serde_json_path/pull/77

hiltontj commented 8 months ago

Hey @FruitieX - thanks for opening the issue. I see the PR and may take a look at that but think it would be good to discuss here a bit first.

This concept is documented in the IETF standard, see Normalized Paths. There is a C# IETF JsonPath implementation that has them built into the query evaluation mechanism, and which you can see in their demo environment.

Unfortunately, serde_json_path, as you have discovered, does not cover this portion of the standard. I have some ideas though, which is why I want to discuss first, and which I will try to get down here when I get a minute or two. Ultimately, my time is a bit tight to implement this these days, so if it is something that you could take the lead on then that would be awesome!

hiltontj commented 8 months ago

I would start by defining a type to represent a normalized path, e.g.,

struct NormalizedPath<'a>(Vec<PathElement<'a>>);

enum PathElement<'a> {
    String(&'a str), // for object keys
    Index(usize), // for list indices
}

This representation should allow you to construct it as you recurse down the JSON structure using borrowed/copied data from the JSON value being queried, vs. having to clone everything into an owned string. That is, 'a here is the lifetime of the JSON value being queried - string keys from that JSON value should have the same lifetime. Array indices are usize which is copy, so no need to worry about lifetimes for those.

FruitieX commented 8 months ago

Thanks for the quick response, and yeah, that sounds more reasonable than my current implementation. I will try to adapt it into using borrowed data instead.

hiltontj commented 8 months ago

@FruitieX - I may have some time to implement this. I am curious, based on your description and your PR, you are looking for both the path and the node to be returned, e.g.,

impl JsonPath {
    fn query_path<'a>(&self, value: &'a Value) -> Vec<(NormalizedPath<'a>, &'a Value)> {
        /* ... */
    }
}

Is it important that both the node, i.e., the &'a Value is also returned, or would you just need the path. I am trying to decide if it is better to do the above, or just return the paths, e.g.,

impl JsonPath {
    fn query_path<'a>(&self, value: &'a Value) -> Vec<NormalizedPath<'a>> {
        /* ... */
    }
}

It would be helpful to understand your use-case a bit more and why you need both, if that is the case.

FruitieX commented 8 months ago

My use-case is a bit unusual, but basically I have a JsonPath with wildcards and I need to know which keys matched the wildcards for each matched value. While less convenient, returning just the paths would also work, as I could then use the path to build a JSON pointer and query the underlying value with serde_json::value::Value::pointer()

The long(er) version is:

I'm writing a program that controls my home automation devices. I'm allowing users (=myself 😃) to configure the behavior of their devices using a simple scripting language called evalexpr. It doesn't have JSON support, but instead I can make use of simple variable assignments like:

devices.hue.office.state.power = true;
devices.hue.office.state.color.r = 255;
devices.hue.office.state.color.g = 128;
devices.hue.office.state.color.b = 0;

devices.lifx.bedroom.state.power = true;
devices.lifx.bedroom.state.brightness = 0.5;

I then build a serde_json::Value from the resulting variable context, and start querying it with serde_json_path. For example to get the state struct of each device I can use the following query:

$.devices.*.*.state

In addition to the state, I now need to know the vendor ID and device ID from the path, so that I know which devices to send the state updates to.

There are probably better ways of doing this, but this way was pretty convenient to implement since my state structs already implemented Serialize/Deserialize due to them being directly used in a REST API / web frontend.

hiltontj commented 8 months ago

Thank you for the write-up @FruitieX!

To be clear, my understanding is that you use the JSONPath to query for the state objects, but then you require the paths themselves to extract the vendor and device IDs, e.g., lifx/bedroom and hue/office, that are associated with the given state objects provided by the query.


Given that Normalized Paths are a part of the JSONPath spec, and this crate is meant to support that spec, I feel obliged to incorporate the feature. Since this could affect the API and underlying query execution in a substantial way, I want to determine if there is significant overhead to having the underlying query logic produce Vec<(NormalizedPath<'a>, &'a Value)> vs. what it is currently doing, i.e., Vec<&'a Value>. So, I can't necessarily promise quick delivery, but I have already started putting something together in #78.

FruitieX commented 8 months ago

Sounds good, thanks! Also no hurry from my side, I'm using my inefficient fork for now 😁

On Fri, 26 Jan 2024, 15.28 Trevor Hilton, @.***> wrote:

Thank you for the write-up @FruitieX https://github.com/FruitieX!

To be clear, my understanding is that you use the JSONPath to query for the state objects, but then you require the paths themselves to extract the vendor and device IDs, e.g., lifx/bedroom and hue/office, that are associated with the given state objects provided by the query.

Given that Normalized Paths are a part of the JSONPath spec, and this crate is meant to support that spec, I feel obliged to incorporate the feature. Since this could affect the API and underlying query execution in a substantial way, I want to determine if there is significant overhead to having the underlying query logic produce Vec<(NormalizedPath<'a>, &'a Value)> vs. what it is currently doing, i.e., Vec<&'a Value>. So, I can't necessarily promise quick delivery, but I have already started putting something together in #78 https://github.com/hiltontj/serde_json_path/pull/78.

— Reply to this email directly, view it on GitHub https://github.com/hiltontj/serde_json_path/issues/76#issuecomment-1912067817, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4AYLTWUPQBAYDVVTFWNTLYQOVOFAVCNFSM6AAAAABB6FH3LOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJSGA3DOOBRG4 . You are receiving this because you were mentioned.Message ID: @.***>

hiltontj commented 7 months ago

Hey @FruitieX - I have made some solid headway in #78. With it, you would be able to do something like:

struct Device {
    vendor_id: String,
    device_id: String,
    state: Value,
}

let config = json!({ /* JSON of home devices configuration */});
let path = JsonPath::parse("$..state")?; // use `..` operator to get all nested `state` nodes
let devices: Vec<Device> = path
    .query_located(&config) // use new `query_located` method
    .iter() // iterate over `LocatedNode`s
    .map(|q| { // map them into the `Device` type (or whatever)
        let loc = q.location(); // get the location, i.e., full normalized path to `state` node
        // extract elements of interest from path:
        let vendor_id = loc.get(1).to_string();
        let device_id = loc.get(2).to_string();
        // the state object itself is the node that was queried for:
        let state = q.node().to_owned();
        Device { vendor_id, device_id, state }
    })
    .collect();
FruitieX commented 7 months ago

Excellent, this should cover all my needs, thanks!

hiltontj commented 7 months ago

@FruitieX - just released v0.6.5. Thank you again for raising the issue and for the helpful discussion! 🍻