Discovery TaskForce related issues

danielpeintner commented 3 years ago

let's capture issues and topics related to Discovery task-force.

[ ] Direct mechanisms (additional parameters, https://github.com/w3c/wot-discovery/issues/90)
[ ] Partial fetch (select)
[ ] Streaming/Pagination TDs

Goal: tentative joint meeting 26 April

zolkis commented 3 years ago

Just noting down some points from the common call with Discovery TF.

it's better to avoid exposing introduction mechanisms for privacy reasons (against fingerprinting)
and also for authentication reasons.

mmccool commented 3 years ago

Discussion:

Don't always need URLs. Normally introductions are automatic (first use case). But optional to specify URLs explicitly if want known target, e.g. for testing. URLs for directories might also be provisioned externally using a management API (second use case).
Output is always a set of TDs.
Don't currently support/expose pagination or streaming. Assume TDs can be held in memory in their entirety.
The "local" method has been removed, now just "direct" and "directory"

Open issues:

Queries - but returns full TDs only. JSONpath etc is not just a filter, can also extract elements/fragments. Related to "partial fetch"?
Access control for fetching TDs. Bootstrapping security. Need to configure (out of band, in management api) appropriate keys, passwords, etc.
If given a "direct" URL, and points at a directory, should proceed as if "directory" was specified. Alternative: if give direct, and it points at a directory, then get back the TD of the directory. Then the script writer could choose to resubmit against the directory. If "directory" is used but points at simple Thing should fail...

zolkis commented 3 years ago

Re: open issues,

Queries - but returns full TDs only. JSONpath etc is not just a filter, can also extract elements/fragments. Related to "partial fetch"?

Query support has been temporarily removed from Scripting discovery. We will discuss this in an issue before we'll reintroduce the feature. Now this full feature offered by the Discovery network API does not currently even have a corresponding use case in Scripting. There we have use cases only with full TDs so far. But we can add new use cases. Until then, even if support is re-added, even as an opaque query string passed to the implementation (like it used to be), I expect scripts will choose to limit themselves to queries that filter whole TDs, otherwise they are free to shoot themselves in the foot :).

Access control for fetching TDs. Bootstrapping security. Need to configure (out of band, in management api) appropriate keys, passwords, etc.

We will make use cases from this to the runtime provisioning API.

If given a "direct" URL, and points at a directory, should proceed as if "directory" was specified.

Yes. The "directory" option is the default (fails if url is not to a directory), so that "direct" must be explicitly specified by the script (and fails if it cannot return a TD). Note that the "direct" discovery method is essentially an alias for fetching a TD in the easy way :), i.e. not by using the Fetch API. For that reason, I expect it might be used a lot in scripts.

zolkis commented 3 years ago

Recording here an argument from the WG/IG main call on 5.5.2021.

From @mmccool, in free formulation,

"One use case for returning partial TDs on a filtered discovery is to only return TD id's when there are a lot of matches, then the application could fetch the matching TDs one by one".

This is certainly useful for implementations, as a transport/session level optimization (as well as TD fragmentation). The Scripting API does however already support the purpose of this use case, since it is designed as an iterator over results (spread in time). So the application could make (an opaque (*)) query that results in a set of partial TDs, which the implementation could use in order fetch and provide TDs to the application, one by one. Whether this optimization is in the middle, doesn't really matter for the end user.

(*) In principle, the query could be opaque from the app's or even from the implementation's point of view (i.e. one could implement their own service over WoT discovery used as a transport) - but implementations should be able to validate it and it should genuinely be a query related to WoT discovery, not an SQL injection for instance. Therefore I would rather be in favor of supporting a simple filter set, or a standard filtering algorithm from the Scripting API. Then, instead of allowing everything in general (like a cryptic regexp search in TDs), we could add features by vetted use cases, so that the script code is better readable for what it is trying to do.

relu91 commented 3 years ago

I am not a fan of introducing another filtering mechanism. In particular, I am afraid that we will end up inventing again another query language. Plus it would probably confuse developers which probably will expect to use queries from the Discovery spec. What about starting from supporting only JSONPath (or XPath) queries and provide a regex expression to disallow selecting?

AFAIK this simple regex should work for JSONPath: \$\[\?$.*$\]$. Since we know the data model of the thing directory (which should be normative) the only admitted queries to retrieve a list of full TDs should be in that form. Example:

$[(@.["@type"] === "sosa:Sensor")]

Sparql would be more complex of course.

relu91 commented 3 years ago

Another option is not checking anything at all and returning a standard plain object from discovery using directory queries. Then fail later if the dev didn't retrieve a full valid TD. We might even distinguish from just requesting the list of TDs and query the list with two different functions:

discovery.directory() // returns a list of TDs
discovery.directory.query() // returns a list of Objects

Using the query mode the scripting API gives to application dev full responsibility for what is going on. Possibly we will fail later when trying to consume the returned objects.

zolkis commented 3 years ago

I am not a fan of introducing another filtering mechanism.

I was referring to the current simple filtering, i.e. whether properties/stanzas are present or have certain values. Like the example you've given.

What about starting from supporting only JSONPath (or XPath) queries and provide a regex expression to disallow selecting?

That might work. But we are running ahead too fast. I think most of our use cases could be solved by simple filtering, but JSONPath is good for experimenting. We need a lot of that, to collect some use case and API usage statistics on what would be needed and preferred. I have the hunch SPARQL is an overkill for this case, at the moment and for a good while, but we can more easily add than remove API functionality.

zolkis commented 3 years ago

Another option is not checking anything at all and returning a standard plain object from discovery using directory queries.

I am afraid we're going to end up with a DOM-like selector model, i.e. huge complication. What are the use cases that make it worth for that? Crawling/indexing TD databases is one, but that is not exactly in the charter, and in specific deployments there are other tools for it.

w3c / wot-scripting-api

Discovery TaskForce related issues #314