w3c / wot-discovery

Repository for WoT discovery discussion
https://w3c.github.io/wot-discovery/
Other
19 stars 17 forks source link

JSONPath queries filtering on semantic annotations #387

Open benfrancis opened 1 year ago

benfrancis commented 1 year ago

I'm experimenting with using JSONPath queries to query a directory of Things.

For a simple query like finding a Thing by its title, this appears to work:

$[?(@.title=='foo')]

Now say I have a semantic annotation using the schema.org ontology, so I want to use a prefix in the property name. I would assume it would be something like:

$[?(@.'schema:floorLevel'=='G')]

At least with the online JSONPath evaluators I've tried, this is not accepted as a valid query. How can I query on a property with a colon in the name?

Going a step further, what if I want to look for Things whose @type array includes the value iot:TemperatureSensor?

benfrancis commented 1 year ago

Having spent a little longer looking at this I've realised that...

$[?(@.'schema:floorLevel'=='G')]

Should be...

$[?(@['schema:floorLevel']=='G')]

(A silly syntax error).

Going a step further, what if I want to look for Things whose @type array includes the value iot:TemperatureSensor?

I still haven't figured out an answer to this question. I've seen mention of an "in" operator from some sources which may help, but it doesn't seem to be part of the JSONPath specification and therefore is perhaps not widely supported?

benfrancis commented 1 year ago

Ugh, it seems like the solution does indeed vary between different implementations.

In "Goessner JSONPath" which evaluates JavaScript expressions, you would use:

$[?(@['@type'].indexOf('iot:TemperatureSensor') != -1)]

But in the Jayway Java implementation you would use:

$[?('iot:TemperatureSensor' in @['@type'])]

As far as I can tell there doesn't appear to be a standardised approach which works everywhere.

This is quite a disappointing limitation given that searching for Things of a given @type is likely to be a very common use case of a WoT Directory.

benfrancis commented 1 year ago

@relu91 I just noticed that you implemented JSONPath search in Zion, using a library you wrote which converts JSONPath to SQL. Nice!

That approach isn't going to work for me because I'm currently using a nosql database, but I'm curious whether you came across the searching through an array issue? If so, which approach did you take?

I'm expecting at some point someone is going to tell me I need to implement the SPARQL-based search API in order to do this, which I am very much trying to avoid! (Though that may be the only way to properly resolve external ontologies used in semantic annotations).

mmccool commented 1 year ago

One issue here is that JSON Path implementations are not consistent, because there is no standard (yet). What we should do is give this feedback to the people working on the IETF spec, that it should work for JSON-LD with prefixes, etc.

Proposed resolution: give feedback to IETF standards developers on this. What exactly should we say?

benfrancis commented 1 year ago

Proposed resolution: give feedback to IETF standards developers on this. What exactly should we say?

I think there are two separate topics:

1. Standardised filter operators

The first is to ensure there is a standardised mechanism to filter by the presence of a value in an array. There currently seem to be two different approaches taken by implementations for these kinds of filters:

  1. Define a finite set of filter operators like in, anyof, size etc.
  2. Allow the use of arbitrary JavaScript expressions

I think my personal preference would be for the former, with an includes or in operator equivalent to Array.prototype.includes(). The latter approach is more flexible but harder to make safe.

2. JSON-LD support

The second is about support for JSON-LD "compact IRIs". You can already filter by strings containing colons like foaf:name, but that only works if everyone uses the same prefixes in their @contexts. To really support JSON-LD, JSONPath would need to support prefix expansion, such that foaf:name would be expanded to http://xmlns.com/foaf/0.1/name in the example below.

{
  "@context": {
    "foaf": "http://xmlns.com/foaf/0.1/"
    ...
  },
  "@type": "foaf:Person",
  "foaf:name": "Dave Longley",
  ...
}

Supporting JSON-LD syntax like this is probably a much bigger topic, and possibly even a separate extension specification.

mmccool commented 1 year ago

Since doing this correctly will have to wait until the JSON Path IETF spec work is done, I have marked this issue as "Defer to Discovery 2.0" for now. Note: "2.0" is just a placeholder name, we have yet to decide on whether we will aim for 1.1 (compatible extensions) or 2.0 (breaking changes). In the meantime we should provide the above feedback to the IETF work and discuss this issue with the JSON-LD group.

mmccool commented 1 year ago

@benfrancis regarding your comments:

  1. Regarding filter operations, when I last read the draft JSONPath spec it did not have regex expressions and in was only for searching for values in an array. I suggested at that time to at least extend in (or provide some other operator) to support substring searches (but not necessarily full regexes). My understanding though is that now there are regexes in the current draft, but to be honest I need to read the draft spec again to see if what they have done is appropriate to this use case. They are not going to support arbitrary JavaScript expressions afaik, because it's too hard to make implementations safe (not impossible, but...).
  2. You are correct that we should really be supporting prefix expansion as well. In theory though prefixes can be defined in external context files and full support would require fetching said context files and interpreting them, both a privacy risk and a computational burden. As a compromise, I was going to suggest that we could allow for prefix expansion ONLY of prefixes explicitly declared in the @context, but not in any referenced files. In fact it may be possible to do this with a (complicated) JSONPath expression that first pulls the prefix definitions out of the @context array (skipping over things like language keywords, etc) and then builds a regex. But I'm not completely sure we can do this with the draft spec since we would need to store fetched state in variables, make decisions about what is a prefix, etc; e.g. "scripting". Perhaps a built-in function (eg. "expandprefix") that could do that for us without complicating the expression (or requiring scripting features) would be something we could propose? Another option would be to standardize prefixes but that runs into a different set of problems.
relu91 commented 1 year ago

Hi all sorry for being unresponsive these days, but I am in the middle of my vacation time (I will be completely off next week).

@relu91 I just noticed that you implemented JSONPath search in Zion, using a library you wrote which converts JSONPath to SQL. Nice!

Yes, we took this path (pun intended) to achieve better performance and to be closer to the current IETF JSONPath standard. The library closely follows draft 5 of the standard with a full compliant Parser but supporting only 90% coverage of the use cases. This is because a 1-1 translation between JSONPath and SQL/JSONPath (which to my surprise is an ISO standard) is not possible. Sadly, there are corner cases that we couldn't translate. If you are interested in the details we can discuss more in one of our calls when I'm back.

That approach isn't going to work for me because I'm currently using a nosql database, but I'm curious whether you came across the searching through an array issue? If so, which approach did you take?

So yes, as you pointed out, the IETF JSONPath does not support any "include" operator in the current draft; at least from my understanding. I left a comment on one of the relevant issues about supporting nested filtering. Regarding how we tackled the issue, well there are some implementations that accept $[? @["@type"] == "Test" ] or $[? @["@type"][*] == "Test" ] to filter the existence of an item inside an array. SQL/JSONPath (even though is not considered a JSONPath implementation) supports this feature and therefore Zion accepts(if you are getting 400 reload the page is a well-known bug that we are trying to resolve) that query and returns the relevant TDs.

As for the other points in the discussion:

Since doing this correctly will have to wait until the JSON Path IETF spec work is done, I have marked this issue as "Defer to Discovery 2.0" for now. Note: "2.0" is just a placeholder name, we have yet to decide on whether we will aim for 1.1 (compatible extensions) or 2.0 (breaking changes). In the meantime we should provide the above feedback to the IETF work and discuss this issue with the JSON-LD group.

+1. For the next release, I would keep an eye on the support of the query language in DBs. We have to keep in mind that 90% of TDDs implementations out there will leverage some sort of DB to filter TDs. On JSONPath we already know that there is not a single DB implementation that supports it (at least we couldn't find any). On the other hand, there are plenty of solutions that use SQL/JSONPath or custom JSON filtering.

The second is about support for JSON-LD "compact IRIs". You can already filter by strings containing colons like foaf:name, but that only works if everyone uses the same prefixes in their @contexts. To really support JSON-LD, JSONPath would need to support prefix expansion, such that foaf:name would be expanded to http://xmlns.com/foaf/0.1/name in the example below.

You are correct that we should really be supporting prefix expansion as well. In theory though prefixes can be defined in external context files and full support would require fetching said context files and interpreting them, both a privacy risk and a computational burden. As a compromise, I was going to suggest that we could allow for prefix expansion ONLY of prefixes explicitly declared in the @context, but not in any referenced files. In fact it may be possible to do this with a (complicated) JSONPath expression that first pulls the prefix definitions out of the @context array (skipping over things like language keywords, etc) and then builds a regex. But I'm not completely sure we can do this with the draft spec since we would need to store fetched state in variables, make decisions about what is a prefix, etc; e.g. "scripting". Perhaps a built-in function (eg. "expandprefix") that could do that for us without complicating the expression (or requiring scripting features) would be something we could propose? Another option would be to standardize prefixes but that runs into a different set of problems.

JSON-LD support will be nice to have moreover if we don't require a full-fledged SPARQL endpoint for it. Perhaps this will happen after https://github.com/ietf-wg-jsonpath/draft-ietf-jsonpath-base/issues/203 discussion is settled. As @mmccool is pointing out, we can "incubate" an extension point like expandprefix independently from the IETF specification and we can have some experiments with it. My fear is, once again, DB support. At the top of my mind is not going to be easy to dynamically expand the prefixes while performing an SQL query. But I have to think about it a little bit more.

benfrancis commented 1 year ago

Thanks for explaining all of that, @relu91.

I left a comment on one of the relevant issues about supporting nested filtering.

The conversation there is not looking promising, it sounds like filter operators like "in" or "includes" are unlikely to make it into the IETF specification, but they will also forbid arbitrary JavaScript expressions. So there may be no way to support the common use case of filtering by a value in an array.

That limitation, combined with the issue of expanding JSON-LD prefixes, seems to suggest that neither of the use cases in my original question will be possible with the JSONPath search API (at least with the first version of the JSONPath IETF specification).

Not being able to search the Directory for Things of a given @type (e.g. get a list of all the temperature sensors in a building), or filter by semantic annotations (e.g. all devices on a particular floor of a building) are unfortunate limitations for the JSONPath search API.

I do have a crude implementation of the JSONPath search API using the jsonpath npm module which first fetches the full list of Things from the database and then filters the list afterwards (using the non-standard JavaScript expression approach), but apart from being non-standard that's obviously also very inefficient.

I'd really like to avoid implementing a full SPARQL endpoint, because it will require significant architectural changes, full RDF processing capabilities, and possibly even a change of underlying database. Currently WebThings Gateway uses SQLite and an early prototype of WebThings Cloud uses MongoDB, but SPARQL would presumably work better with a graph database like Apache Marmotta or Amazon Neptune.

For the time being I've had to implement a custom filtering API for Things which I can map directly onto MongoDB (or SQL) queries, e.g.

/things?@type=TemperatureSensor&floorLevel=G

That's a shame as it won't be interoperable.

mmccool commented 1 year ago

Another option we have is to take the IETF JSONPath standard, then specify (and incubate) specific extensions, then include those in our standard. However, I hate to do that since it really needs to be in the IETF standard, and these do seem like common use cases, even in the non-IoT world!

Maybe we need to look at XPath again. The only blocker there as I understood it was a lack of implementations (well, and a non-JSON-like-syntax, but oh well).

mmccool commented 1 year ago

BTW there is another issue: geolocation. There is probably another set of query extensions needed for geolocation, e.g. limiting queries to a certain area.

There is a variant of the strategy in the previous comment we could use for both this and for @type searches: we could allow "stacked" filters in our API. Specifically, we could define some specific (narrow) query use cases (like searching for devices of a particular @type, or in a certain area) and allow queries for these outside of JSONPath, but then allow filters to be "chained", e.g. the output of one is used as input for another. For example, I could find all TDs with @type light, then compute a JSONPath filter on the result. Implementation-wise, this is just a join operation.

Maybe we can also handle prefix expansion this way... e.g. include a filter that expands prefixes to URLs then contracts known URLs to new prefixes that we can then base queries on.

Then we could include these additional queries in our API spec without touching JSONPath.

benfrancis commented 1 year ago

@mmccool wrote:

There is a variant of the strategy in the previous comment we could use for both this and for @type searches: we could allow "stacked" filters in our API. Specifically, we could define some specific (narrow) query use cases (like searching for devices of a particular @type, or in a certain area) and allow queries for these outside of JSONPath, but then allow filters to be "chained", e.g. the output of one is used as input for another.

I do think that defining a custom search API covering some common WoT use cases could be a reasonable approach if the JSONPath API is too limiting and the SPARQL API is very complex to implement and use. It could provide a more lightweight approach to carrying out basic semantic queries.

I assume that would have to be targeted at the next version of the specification.

mmccool commented 1 year ago

Yes, this would be for the next version of the Discovery spec. I expect we will want to talk about this in the Pre-TPAC meetings next week (we need to roll up topics in our Discovery call on Monday, too...). As you said, having use-case-specific, chainable filters would also have the benefit of making "simple" queries simpler, while still allowing full query languages for complex cases. It would also make things like geolocation orthogonal to other kinds of queries (e.g. semantic) and query languages (SPARQL/JSONPath/XPath) if we also had geolocation-based filters.

mmccool commented 4 months ago

We just merged a PR that updated the JSONPath reference to the official RFC9535, see #542 We need to look at this RFC now and see if (a) our document is consistent with it (check examples, etc) and (b) see if there are important use cases it can't do (e.g. search for semantic tags).