w3c / wot-discovery

Repository for WoT discovery discussion
https://w3c.github.io/wot-discovery/
Other
19 stars 17 forks source link

How to keep Thing Descriptions updated in a Directory #464

Open benfrancis opened 1 year ago

benfrancis commented 1 year ago

I've implemented a basic WoT Directory as part of a cloud service designed to provide analytics for commercial buildings.

It works like this:

  1. Thing Descriptions are self-hosted by devices or exposed by an on-premises Web of Things gateway
  2. A user adds a device to the cloud service by providing the URL of its Thing Description (an HTTP URL, as per the HTTP Basic Profile)
  3. The client fetches the Thing Description from this URL
  4. The client then registers the Thing Description with the Directory by sending its contents in an HTTP PUT request (as per the Directory Service API)
  5. The directory service augments the Thing Description by adding registration metadata (such as created and modified members) and will add an id member if one is missing (as per the WoT Discovery specification)
  6. Clients can then query the directory and retrieve the Enriched Thing Description

My question is, what happens if the Thing Description is modified at its original source (device or gateway) and needs updating in the Directory, or the registration expires (as per the expires or ttl members of the registration metadata)? How should the Thing Description be updated?

I don't really understand how Thing Description expiry and updates are supposed to work. I get the sense that the Directory Service API was designed with a different use case in mind to what I'm using it for, but I can't figure out what that use case is.

The only way I can imagine this can work under the current specification is that a user manually re-adds the updated Thing Description using the Update operation of the Directory Service API when they notice it has been modified or its registration has expired, but that's a very manual process which is probably not practical in real life.

There's also the use case of a Directory Description being added to a Directory (something I'm planning to implement in order to add a large number of Things to a Directory in one batch operation) where Directories could be kept in sync using the Events API, but that just raises further questions.


In my opinion, in an ideal world:

Obviously it doesn't actually work like this because Things are not identified by their Thing Description URL, but by an id member which may be provided by a Thing or Directory and can be set to any URI (e.g. a URN). Things are registered with a Directory by submitting the Thing Description contents, the Directory has no record of the URL from which a Thing Description may originally have been retrieved, if it was hosted at a URL at all.

I'm quite familiar with this problem because for complicated legacy reasons the W3C Web App Manifest specification has exactly the same issue, which means that manifests have to be opportunistically updated whenever they are linked to by a web page navigated to by a user agent. That strategy doesn't work for Thing Descriptions.

benfrancis commented 1 year ago

how does it know the original source URL

A potential solution to this specific problem might be that an Enriched TD contain a Link with rel=canonical, which points to the original source. However, there's currently no way to provide this as part of the creation operation.

farshidtz commented 1 year ago

there's currently no way to provide this as part of the creation operation.

Why not? It can be added to the root of TD or the registration object.

benfrancis commented 1 year ago

@farshidtz wrote:

Why not? It can be added to the root of TD or the registration object.

In this case the client of the Directory Service API is a Consumer of the TD, not a Producer (it discovered the TD at an HTTP URL using a Direct introduction mechanism). It seems odd to me that a Consumer would modify a TD before registering it in a Directory*.

Even if the original Producer of the TD included a canonical Link (which is not one of the recommended Link relation types in the TD specification BTW), since that usage isn't standardised anywhere, neither a Directory server nor another client would know what to do with it.

If there's no standardised way to update a TD when it expires, what is the purpose of the expiry metadata? Is it just used by the Directory service to delete TDs when they expire? Presumably all TDs with an expiry date will therefore eventually get deleted unless the client which originally registered the TD (either the Producer of the TD or a Consumer which separately keeps track of TD source URLs and expiry dates) manually updates it?

This is further complicated for anonymous TDs because the client also has to keep track of which Directory server-generated ids correspond to which TDs in order to know which Directory registration to update.


* If a Consumer modifies TDs before submitting them to a Directory they could do so maliciously, e.g. by modifying Form URLs or providing a fake canonical URL for TD updates. In the future this could potentially be protected against by signing TDs, but any modification of the TD by the Directory client or server (e.g. to add an id or registration metadata) would invalidate that signature. If TDs were instead added to a Directory by URL (an HTTPS URL hosted by the Producer), then the Directory could be sure that the TD is authoritative and would have an obvious way to fetch a new version when it expires.

mmccool commented 1 year ago

Don't think this directly impacts the spec, but would be good to clarify. Currently we don't have a "polling" mechanism in directories.

mmccool commented 6 months ago

See also discussion in #164.

In the current design the directory is "passive", e.g. it only accepts registration requests, and does not do polling. It is the client's responsibility to update the directory (or "ping" it with an empty patch) before the TTL expires.

Due to complications with network accessibility I think it is best to keep that. For example, adding polling to a directory that lives in the cloud would cause problems if the directory can't reach some Things due to firewalls, etc. Note that a cloud directory for Things behind a firewall is still useful for other consumers also behind the firewall, even if the TDD itself can't reach them.

My original design for this had a separate service, let's call it a "Registrar" (previously I called it a "Discoverer" but that name is ambiguous and is defined to mean something different...), whose job it was to find and register TDs with the TDD. This could be a small service that runs locally on a "hub", behind the firewall, and so it can poll local devices. We dropped that in the current spec but could add it back.

However, then the problem arises, how do Registrars know about Things that register themselves directly with the TDDs? Perhaps we could add a hook so the Registrar could be notified of TDD updates (we do have events for that) and could start polling those devices for updates.

egekorkan commented 6 months ago

whose job it was to find and register TDs with the TDD. This could be a small service that runs locally on a "hub", behind the firewall, and so it can poll local devices.

This is something we do at Siemens. See further information at https://github.com/w3c-cg/webagents/issues/29