hiltontj / serde_json_path

Query serde_json Values with JSONPath
https://serdejsonpath.live/
MIT License
51 stars 4 forks source link

Pre-Parsed JsonPath newtype #9

Closed Dav1dde closed 1 year ago

Dav1dde commented 1 year ago

I think a new type like struct JsonPath(_) would make sense to be included.

The JsonPath type should be pre-parsed and validated and never raise an error when passed to .json_path(). It could also implement Deserialize (e.g. for use in config files).

This would also allow further optimization, since the path wouldn't need to be re-parsed everytime it is used.

It would be nice if .json_path() would become infallibale if passed a JsonPath but that would mean a breaking change, otherwise something like fn json_path<T>(path: T) where T: TryInto<JsonPath>, crate::Error: From<T::Error> could work (maybe tricky to accept &JsonPath).

My current workaround for the config usecase:

#[derive(Debug)]
pub struct JsonPath(String);

impl std::ops::Deref for JsonPath {
    type Target = str;

    fn deref(&self) -> &Self::Target {
        self.0.as_str()
    }
}

impl std::fmt::Display for JsonPath {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        write!(f, "{}", self.0)
    }
}

impl<'de> Deserialize<'de> for JsonPath {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: serde::Deserializer<'de>,
    {
        let inner = String::deserialize(deserializer)?;
        serde_json::Value::Null
            .json_path(&inner)
            .map_err(serde::de::Error::custom)?;
        Ok(JsonPath(inner))
    }
}
hiltontj commented 1 year ago

Hi @Dav1dde - thanks for raising the issue. I agree whole-heartedly. There is definitely a need for a type that holds pre-compiled query paths.

This has become apparent to me after using serde_json_path in some other projects, and is something I've had in mind since I started serde_json_path. I think that in part, I left the name JsonPath out of the API with the intention of using for this purpose 🙂. Anyways...

Discussion

The json_path method being fallible is a nuisance. json_path's only source of fallibility is from the parsing step. So, separating the parsing step from the query step would (re-iterating your points here):

  1. enable pre-parsed/compiled JSON Path queries, i.e., not re-parsing/compiling on every call to json_path, which should be more efficient
  2. make json_path infallible, which would make it easier to work with
  3. allow for a Deserialize implementation so that JsonPath can be declared in schema and config files

It could be a breaking change, but I am okay with that. This library is still young and these kinds of changes are needed to improve its adoption.

Long term, if we are to uphold this, the query step can not be changed to become fallible. Right now it is not; however, I am currently reasoning though how to implement the last part of the JSON Path specification (Function Extensions), so I will try to factor this requirement into that work.

Suggestions for the API

Having the json_path method accept a JsonPath and be infallible:

let path = JsonPath::parse("$.foo.bar")?;
let value = json!({ "foo": { "bar": "baz" } });
let nodes = value.json_path(&path).all();
assert_eq!(nodes, vec!["baz"]);

You could also query directly with the JsonPath type:

let path = JsonPath::parse("$.foo.bar")?;
let value = json!({ "foo": { "bar": "baz" } });
let nodes = path.query(&value).all();
assert_eq!(nodes, vec!["baz"]);

I'm not sure if parse is the best choice of name there. from_str or try_from are also reasonable, and in fact, it may be helpful to impl FromStr for JsonPath or some variants of TryFrom for convenience, in addition to the Deserialize implementation.

The above examples bring into question the need for the all method. Perhaps we can simplify the API to eliminate it. That can be for a separate issue though...

hiltontj commented 1 year ago

@Dav1dde - I have a PR up in #10 . Let me know your thoughts. I need to test a couple of things, namely the Deserialize implementation, but I think that having JsonPath in the API works much better. I will probably be able to release it soon.

The JsonPathExt trait was interesting in concept but I think loses practicality with #10. I may consider removing JsonPathExt with this update, as it will be broken by the update anyway. I will think on it a bit more.

Dav1dde commented 1 year ago

Regarding Deserialize: I am not sure if JSONPath should directly implement deserialize, there would also be the option to provide a module or function to use together with serde(with = ... ) or serde(deserialize_with = ...).

For me personally it's very convenient and since the library already only works with serde it doesn't even add another dependency.

It would be even better if it could reconstruct the path from the internal representation and implement Display and Serialize, but this might be really hard (or impossible) for little gain overall.

Long term, if we are to uphold this, the query step can not be changed to become fallible. Right now it is not; however, I am currently reasoning though how to implement the last part of the JSON Path specification (Function Extensions), so I will try to factor this requirement into that work.

This is tricky, infallible is a lot of convenience, but if there are any fallible JSON Paths, this has to be fallible. But this is still a big improvement having a distinction between errors because the path is invalid and errors when the path fails to match.

I'm not sure if parse is the best choice of name there. from_str or try_from are also reasonable, and in fact, it may be helpful to impl FromStr for JsonPath or some variants of TryFrom for convenience, in addition to the Deserialize implementation.

parse sounds good to me, that's also what url::Url uses. I think it should always have a FromStr, it's just nice to have "$.foo".parse().

The above examples bring into question the need for the all method. Perhaps we can simplify the API to eliminate it. That can be for a separate issue though...

I was thinking about this as well, maybe .one() should return None if there is more than one element matching (maybe it should even return a Result?). Some Java database clients throw an exception if findOne yields more than one result, you have to specifically use findAll() if your query can yield more than one result.

hiltontj commented 1 year ago

@Dav1dde - adding the Deserialize impl was not too much hassle, and indeed, this is meant to be used alongside serde, so I think it is acceptable to have that.

It would be even better if it could reconstruct the path from the internal representation and implement Display and Serialize, but this might be really hard (or impossible) for little gain overall.

That would be cool, and I don't think overly difficult, but certainly a chore. This may also be useful along with a builder API for the JsonPath struct.

This is tricky, infallible is a lot of convenience, but if there are any fallible JSON Paths, this has to be fallible. But this is still a big improvement having a distinction between errors because the path is invalid and errors when the path fails to match.

My impression with the JSONPath spec is that valid queries are meant to be infallible. With everything up until Function Extensions, which is what serde_json_path supports, this was fairly straightforward. Function Extensions, however, do make this tricky. The working group is still finalizing Function Extensions in the spec though. So, hopefully once that is completed, functions in JSONPath queries will be able to be validated at parse time.

I was thinking about this as well, maybe .one() should return None if there is more than one element matching (maybe it should even return a Result?). Some Java database clients throw an exception if findOne yields more than one result, you have to specifically use findAll() if your query can yield more than one result.

I was sort of going on what sqlx does with:

...but clearly it doesn't quite line up. I think that this deserves a separate issue, as it can certainly be refined.

hiltontj commented 1 year ago

@Dav1dde - you may be interested in #11