w3c / wot-discovery

Repository for WoT discovery discussion

https://w3c.github.io/wot-discovery/

Other

19 stars 17 forks source link

Resolve TD/Thing ID Semantics (was Reconsider @type ThingID) #190

Open relu91 opened 3 years ago

relu91 commented 3 years ago

188 introduced `@type: ThingID` for identifiers inside a Thing Description Directory ( check `uriVariables` description below). During the 31/05 call, there were some comments about the fact that we are missing a proper definition for this new keyword. Moreover, we should clarify the differences between `id` from TD ontology and these new terms. One proposal was to reword it as `LocalThingID` referring to the fact that is just an id related to the directory.

See:

"uriVariables": {
                "id": {
                    "@type": "ThingID",
                    "title": "Thing Description ID",
                    "type": "string",
                    "format": "iri-reference"
                }
            }

farshidtz commented 3 years ago

I think the original reason for adding this was to associate id URI variable with the TD.id. Not sure why we still insist that this id is not associated with the TD (at least the TD in directory). Note that TD has Thing type and this is it's id, so: ThingID. Even for Anonymous TDs, this id is still the TD's id and added as its blank node identifier, see here. If not, the TD should not be advertised as a JSON-LD object. Though, I might be totally wrong :)

egekorkan commented 3 years ago

So I am not exactly sure if I understand the comment above but the id in the TD refers to a Thing that can have multiple TDs but same id. For example, URNs include MAC addresses or serial numbers of physical devices. The id in the directory has to be unique per TD so that it can be managed and is for me another id.

farshidtz commented 3 years ago

Let's look at the following TD:

{
    "@context": "https://www.w3.org/2019/wot/td/v1",
    "id": "urn:dev:ops:32473-WoTLamp-1234",
    "title": "MyLampThing",
    "securityDefinitions": {
        "basic_sc": {"scheme": "basic", "in": "header"}
    },
    "security": "basic_sc"
}

In compact form:

{
  "@id": "urn:dev:ops:32473-WoTLamp-1234",
  "http://purl.org/dc/terms/title": "MyLampThing",
  "https://www.w3.org/2019/wot/td#hasSecurityConfiguration": {
    "@id": "basic_sc"
  },
  "https://www.w3.org/2019/wot/td#securityDefinitions": {
    "@index": "basic_sc",
    "https://www.w3.org/2019/wot/td#in": "header",
    "https://www.w3.org/2019/wot/td#scheme": "basic"
  }
}

The TD.id (urn:dev:ops:32473-WoTLamp-1234) is the @id of the node, i.e. the node identifier. In other words, the TD.id is the identifier of the TD (as per JSON-LD spec) but not necessarily the identifier of the underlying device. It could be the case that this identifier happens to be equal to the identifier of the underlying device.

(If the id inside the TD is the identifier of the device, it should not be aliased to @id in the context.)

To my understanding, two nodes cannot have the same identifier in a single domain, not just in a directory.

Now, looking at the ontology here, Thing is the TD and has an id field which is its identifier. This is also the @id node identifier. In discovery, we are giving this ThingID type.

From the TD spec:

Thing An abstraction of a physical or a virtual entity whose metadata and interfaces are described by a WoT Thing Description, whereas a virtual entity is the composition of one or more Things.

id | Identifier of the Thing in form of a URI [RFC3986] (e.g., stable URI, temporary and mutable URI, URI with local IP address, URN, etc.). title | Provides a human-readable title (e.g., display a text for UI representation) based on a default language.

To me, the above clearly states that id is the identifier of the Thing, where Thing is the TD.

(If Thing refers to the device and not the TD, then the TD should not be called a Thing)

AndreaCimminoArriaga commented 3 years ago

I agree with @farshidtz explanation.

Besides, from the RDF perspective I think it makes more sense to rely on the xsd type (anyURI); which is actually the one used in the Thing spec for describing the form of such id (without actually specifying the ThingID rdf:type). I do not see the necessity of defining such type, could you provide more details about it?. If we define the type ThingID, are we going to define an rdf: type as well for any input that the API has ? e.g., SparqlQuery for the query parameter?

mmccool commented 3 years ago

As I have said before, for various reasons we should avoid making the ID used to look up a record in a directory necessarily the same as the value of the "id" field in each record. These reasons include:

Privacy. Multiple local ids for different purposes can help to discourage tracking.
Uniqueness. It's possible that two different Things might try to register two Things with the same id. I can guarantee you there will be at least one developer who is lazy and gives all their Things the same id. In this case we'll have to give the second and third instances of such devices unique ids anyway.
Anonymous TDs. Sure, we can use blank nodes, but exceptions are annoying.

That said, it can be permitted that the local ids may be the same as the "id" field in the Thing, if the implementation choses to do so. But for 2 and 3 at least we will have to make some exceptions. BTW 2 is also why we need a different action for "create" and "update"; so we don't confuse a new Thing trying to register itself with another Thing that (because the dev was lazy) has the same id. Note that 2 means the dev is not complying with the TD spec, but I'd rather see the system be resilient than pedantic.

Anyhow, if we call this a "LocalThingID" then we can simply make the statement somewhere that "This may but is not guaranteed to be the same as the internal Thing "id" field." and confusion is avoided.

PS: for 1, I could imagine a directory implementation having an installation option that the user can select that would always give directory records unique IDs unrelated to the "id" field...

farshidtz commented 3 years ago

@mmccool We can claim that the resource ID and Thing ID are distinct, however, two different Things with the same id still SHOULD NOT co-exist in the same domain.

From JSON-LD spec:

@id Used to uniquely identify node objects that are being described in the document with IRIs or blank node identifiers.

As stated, the id is used to uniquely identify the node objects. These objects will at least co-exist in listing, notification, search results and so the ids must be unique.

Regarding point 2, the current HTTP API does not allow creating two Things with the same id. A second attempt to create using the same PUT /things/{id} will update the first. This is stated in the directory spec and is a typical RESTful design. If necessary, we can extend the spec and prevent this by passing a flag such as force_create=true to prevent update and return an error instead. Currently, an explicit update can be done by first GET/HEAD and then PUT requests or by using PATCH and giving a merge patch document.

PS: for 1, I could imagine a directory implementation having an installation option that the user can select that would always give directory records unique IDs unrelated to the "id" field...

Yes, but the Thing id should still stay unique. Apart from the requirement coming from Linked Data specs, having duplicate IDs conflicts with the functionalities of all listing, notification, and search APIs.

An implementation can work around this by offering isolated namespaces for registration, search, notification APIs, e.g.:

{base}/ns1/urn:example:1234 -> a TD
{base}/ns2/urn:example:1234 -> another TD with same ID

Privacy. Multiple local ids for different purposes can help to discourage tracking.

Not sure this will help: we can still have multiple Thing Links (the new type added in discovery) with unique ids pointing to the actual Thing, all in the same directory.

Back to the actual topic: I think calling it ThingID is correct, but something like ResourceID is better if directories want to have an additional set of unique resource IDs in parallel with the Thing IDs. This type does not automatically map the id URI variable to the identifier embedded inside the Thing.

@AndreaCimminoArriaga Maybe the purpose of having this semantic type was to indicate that all id URI variables in the given directory TD are of the same type (@benfrancis, could you clarify?). If we consider setting semantic types only for grouping of URI variables that are reused across multiple affordances, it will not be needed for other URI variables that are defined once only.

egekorkan commented 3 years ago

Some remarks:

(If Thing refers to the device and not the TD, then the TD should not be called a Thing)

TD is already not called a Thing? If you mean that the @type is a Thing:

That is just an optional annotation
We can argue that it should be changed to a ThingDescription. I do not have an educated opinion on this.

I need someone who has precise information on the history of the standardization but here is what I remember:

The id field is not thought to be used only for databases, search or JSON-LD @id. It simply identifies a physical (or virtual) device for me. That is why ALL the examples in the spec uses device URNs that are really using the notation of https://datatracker.ietf.org/doc/html/draft-ietf-core-dev-urn-10#section-5.

It is perfectly ok that two TDs have the same id since they are generated/hosted by the same physical device or are referring to the same device. Maybe @vcharpenay can provide more precision on this since I remember talking about this with him at some point.

farshidtz commented 3 years ago

TD is already not called a Thing? If you mean that the @type is a Thing:

That is just an optional annotation

We can argue that it should be changed to a ThingDescription. I do not have an educated opinion on this.

I'm really not sure anymore. I also had the opinion that they are distinct (https://github.com/w3c/wot-discovery/issues/133#issuecomment-806504146) but was convinced otherwise. We recently had this discussion before changing affordance names and directory API paths to use "thing" instead of "td".

For start, the section that described the TD class is called "Thing". https://www.w3.org/TR/wot-thing-description11/#thing

Again, Thing illustrates the metadata (=TD) class and has an id:

I need someone who has precise information on the history of the standardization but here is what I remember:

The id field is not thought to be used only for databases, search or JSON-LD @id. It simply identifies a physical (or virtual) device for me. That is why ALL the examples in the spec uses device URNs that are really using the notation of https://datatracker.ietf.org/doc/html/draft-ietf-core-dev-urn-10#section-5.

It is perfectly ok that two TDs have the same id since they are generated/hosted by the same physical device or are referring to the same device. Maybe @vcharpenay can provide more precision on this since I remember talking about this with him at some point.

In my opinion, the history behind it doesn't matter much. What matters is that the id field inside the TD is currently, clearly the @id/ identifier of the TD.

In realistic settings, it is very easy to guarantee unique IDs in virtual resources. But it is almost always the case that heterogenous physical devices will end up having colliding IDs, even is small-scale environments. Not every manufacturer is going to follow a scheme guaranteeing unique IDs. But it is very easy to make sure TDs that have unique IDs and then refer to those physical devices. I think taking a physical device ID and directly using it as the identifier of a TD in the world wide web of things is fundamentally flawed.

mmccool commented 3 years ago

So Ege's point has me worried:

The id is supposed to be about the Thing, not the TD
There can be different TDs for the same Thing (e.g. they might have different URLs representing different access paths, local vs. cloud access for instance)
These different TDs will have the same "id" since they refer to the same Thing
What happens if both TDs end up getting registered to the same TDD?

Use case: different TDs for local/remote access; local TDD; both TDs wanted in TDD

In conclusion: different TDs can have different views of the same entity (the Thing).

AndreaCimminoArriaga commented 3 years ago

From the RDF perspective, and thus this applies for the tags '@id' or 'id' since they are equivalent (the @content manages this equivalence, two TDs can not have the same id if they are stored in the same TDD. This is due to the fact that two resources can not have the same id in RDF.

I think this makes sense since two TDs referring to the same Thing offer two different views or realities of the same Thing. However, I agree that these relationships should be somehow present in the TDs. As a solution we could extend the discovery context (although the place to do this extension is in the Thing Description context) and add a property that points to an id that is the Thing id. Giving an example of two TDs referring to the same Thing:

{
    "@context": [
        "https://www.w3.org/2019/wot/td/v1",
        { "saref": "https://w3id.org/saref#" }
    ],
    "id": "urn:dev:ops:32473-WoTLamp-Properties-1234",
    "thing" : "urn:dev:ops:32473-Thing-1234",
    "title": "MyLampThing",
    "@type": "saref:LightSwitch",
    ...
    "properties": {
        "status": {
            "@type": "saref:OnOffState",
            "type": "string",
            "forms": [{
                "href": "https://mylamp.example.com/status"
            }]
        }
    }
}

{
    "@context": [
        "https://www.w3.org/2019/wot/td/v1",
        { "saref": "https://w3id.org/saref#" }
    ],
    "id": "urn:dev:ops:32473-WoTLamp-Actions-1234",
    "thing" : "urn:dev:ops:32473-Thing-1234",
    "title": "MyLampThing",
    "@type": "saref:LightSwitch",
    ...
    "actions": {
        "toggle": {
            "@type": "saref:ToggleCommand",
            "forms": [{
                "href": "https://mylamp.example.com/toggle"
            }]
        }
    },

Nevertheless, according to RDF the resources that have an id must be accesible and provide data therefore if we add the id of the Thing who is going to provide or host the Thing data? is this just an id that will not provide any further data? if this is the case the Thing id will act like a MAC address, it will just be a literal property (a string). Depending on this the new property will be a data property or an object property.

Besides this solution, I agree with a comment of @egekorkan. It is a bit confusing that the type of these documents is Thing when they should be actually called Thing Descriptions since there is another entity that is the actual Thing. If I'm not mistaken in the past the type of the description was ThingDescription, which could leave space for adding a reference to a different entity that has a type Thing.

Summing up, adding a tag in the json that points to the id of the Thing it is a easy-fix solution that should be implemented in the Thing Description @context. It must be modelled as a data property (literal) if this id is not going to provide further data, or as an object property if this Thing entity will provide data (but in this case having only Thing as type will be confusing).

farshidtz commented 3 years ago

So Ege's point has me worried:

The id is supposed to be about the Thing, not the TD

There can be different TDs for the same Thing (e.g. they might have different URLs representing different access paths, local vs. cloud access for instance)

These different TDs will have the same "id" since they refer to the same Thing

What happens if both TDs end up getting registered to the same TDD?

Use case: different TDs for local/remote access; local TDD; both TDs wanted in TDD

In conclusion: different TDs can have different views of the same entity (the Thing).

@mmccool Did you get a chance to read my comment above? I am simply referring to the spec and this is not my personal opinion:

The id is a member of the Thing class
That same Thing class is the Thing Description

If id is really about the underlying physical/virtual entity, then why is it tied to the TD using the @id key? As @AndreaCimminoArriaga explained above, @id of the TD is the identifier of that node and must be treated appropriately.

mmccool commented 3 years ago

Discussion: Question: is the id the id of the Thing or of the TD? Ege: Was originally created to give ids to Things; reason to change ids to dev-urns, etc.
Farshid: Useful to use same id for two TDs that refer to the same Thing. But the spec clearly says the id is the of the Thing, which is the @type of the Thing Description. Andrea: Can use multiple linked TDs with different ids (see above) Kaz: we need to think what we actually meant in the spec McCool: 1. spec is ambiguous 2. we have use cases for both interpretations 3. we need to clarify the spec, and then come up with solutions for the omitted use cases

mmccool commented 3 years ago

Let's continue to discuss here, and discuss the implications of either choice (id of TD or Thing) and how to fix the "other" use cases with both choices.

AndreaCimminoArriaga commented 3 years ago

As mentioned during the meeting, I think that the whole problem is due to the fact that there are two entities in the whole discussion: ThingDescription and Thing. A Thing is a physical device and the ThingDescription is a "view" that describes such Thing. Now, there is a relationship bertween these entities a Thing can clearly have one or more ThingDescriptions (I'm not sure if it makes sense to say that a Thing may have zero). In addition, we know that a Thing has at least one attribute, which is the id, that must uniquely identify the device.

The problem, for me, is due to the fact that these two entities are not present in the specification. As @farshidtz previously explained, the spec defines only one entity that is the Thing. This Thing nevertheless is a ThingDescription and not the Thing itself. So the clear solution would be to introduce a new entity in the ontology and the thing description model, so to have the classes @type Thing and @type ThingDescription.

Following this approach we could easily define multiple ThingDescriptions, each of which have different ids, that refer to the same Thing. Let's imagine we have a Thing that is a Lamp, and we want to define three ThingDescriptions that point to the same Lamp (the id of the lamp will be urn:dev:ops:32473-Thing-123)

{
    "@context": "https://www.w3.org/2019/wot/td/v1",
    "id": "urn:dev:ops:32473-WoTLamp-properties-1234",
    "title": "MyLampThing",
    "@type" : "ThingDescription"
    "securityDefinitions": {
        "basic_sc": {"scheme": "basic", "in":"header"}
    },
    "security": ["basic_sc"],
    "properties": {
        "status" : {
            "type": "string",
            "forms": [{"href": "https://mylamp.example.com/status"}]
        }
    }
    "thing" : {
       "id" : "urn:dev:ops:32473-Thing-123",
       "@type" : "Thing"
    }
}

{
    "@context": "https://www.w3.org/2019/wot/td/v1",
    "id": "urn:dev:ops:32473-WoTLamp-events-1234",
    "title": "MyLampThing",
    "@type" : "ThingDescription"
    "securityDefinitions": {
        "basic_sc": {"scheme": "basic", "in":"header"}
    },
    "security": ["basic_sc"],
    "events":{
        "overheating":{
            "data": {"type": "string"},
            "forms": [{
                "href": "https://mylamp.example.com/oh",
                "subprotocol": "longpoll"
            }]
        }
    },
    "thing" : {
       "id" : "urn:dev:ops:32473-Thing-123",
       "@type" : "Thing"
    }
}

{
    "@context": "https://www.w3.org/2019/wot/td/v1",
    "title": "MyLampThing",
    "@type" : "ThingDescription"
    "securityDefinitions": {
        "basic_sc": {"scheme": "basic", "in":"header"}
    },
    "security": ["basic_sc"],
    "actions": {
        "toggle" : {
            "forms": [{"href": "https://mylamp.example.com/toggle"}]
        }
    },
    "thing" : {
       "id" : "urn:dev:ops:32473-Thing-123",
       "@type" : "Thing"
    }
}

Notice that the last ThingDescription is an anonymous one, its id is not specified (and thus it should be a blank node in RDF or the Directory should inject a local id). Nevertheless, it is possible to find any ThingDescription that is related to the same Thing regardless of its id with any search (JSONPath, XPath, and SPARQL).

A less elegant, but that require fewer changes in the spec, would be to define a property that points to the thing id (but this solution is not very align with the RDF principles). The previous examples would look like

{
    "@context": "https://www.w3.org/2019/wot/td/v1",
    "id": "urn:dev:ops:32473-WoTLamp-properties-1234",
    "title": "MyLampThing",
    "@type" : "ThingDescription"
    "securityDefinitions": {
        "basic_sc": {"scheme": "basic", "in":"header"}
    },
    "security": ["basic_sc"],
    "properties": {
        "status" : {
            "type": "string",
            "forms": [{"href": "https://mylamp.example.com/status"}]
        }
    }
    "thing" : "urn:dev:ops:32473-Thing-123"
}

{
    "@context": "https://www.w3.org/2019/wot/td/v1",
    "id": "urn:dev:ops:32473-WoTLamp-events-1234",
    "title": "MyLampThing",
    "@type" : "ThingDescription"
    "securityDefinitions": {
        "basic_sc": {"scheme": "basic", "in":"header"}
    },
    "security": ["basic_sc"],
    "events":{
        "overheating":{
            "data": {"type": "string"},
            "forms": [{
                "href": "https://mylamp.example.com/oh",
                "subprotocol": "longpoll"
            }]
        }
    },
    "thing" :  "urn:dev:ops:32473-Thing-123"
}

{
    "@context": "https://www.w3.org/2019/wot/td/v1",
    "title": "MyLampThing",
    "@type" : "ThingDescription"
    "securityDefinitions": {
        "basic_sc": {"scheme": "basic", "in":"header"}
    },
    "security": ["basic_sc"],
    "actions": {
        "toggle" : {
            "forms": [{"href": "https://mylamp.example.com/toggle"}]
        }
    },
    "thing-id" : : "urn:dev:ops:32473-Thing-123"
}

The first solution entails defining the classes ThingDescription and Thing, and an object property that relates them. The second is just adding a data property to the current Thing. Nevertheless, the first solution allows defining more properties to the Thing (not to the ThingDescription which is already fine) and could be useful for combining the current Ontology with others like SAREF. If the ThingDescription is a view and not the Thing itself, the type "saref:Light" should be assigned to the Thing and not to the ThingDescription.

mmccool commented 2 years ago

Unfortunately, I think we want a compatible solution so it can go into the TD 1.1 spec. During the F2F (on Oct 28, during the T2TRG/DID session) the idea of using "fragment identifiers" on IDs came up. I need to think more/research more about this, but this is an outline or my thoughts:

Extend IDs with a fragment identfier (or maybe a query parameter) providing a "view" identifier. This would distinguish different TDs that refer to the same Thing. In this case, dropping the extension gives the ID of the Thing. With the extension, it's the ID of the TD.
For backward compatibilty, an ID without an extension would be considered to be the "primary" TD and in a 1:1 relationship with a Thing. A Thing should only have ONE "primary" TD.
Any "derived" TD, e.g. for a proxy, or for changing the default language, etc. should add an extension to the ID.
Note this means we can easily add different "views" (different TDs for the same Thing) to a TDD since the IDs will be distinct.
We still have the problem of creating unique extensions (to avoid accidental name conflicts). Perhaps extensions could ALSO be uuids (as in the examples below), or some other crytpographically unique mechanism (e.g. hash of time the view was created, etc).

Examples:

Thing: urn:dev:ops:32473-Thing-123
Primary TD: urn:dev:ops:32473-Thing-123
Derived TD (e.g. for a proxy): urn:dev:ops:32473-Thing-123#123e4567-e89b-12d3-a456-426614174000
Alternative Derived TD Syntax: urn:dev:ops:32473-Thing-123?view=123e4567-e89b-12d3-a456-42661417400

I'm a little concerned that fragment identifiers may not have the right semantics, since we are not talking about a part of a document but a view, and also may conflict with other usages, i.e. using JSONPointers to reference a part of a TD (if the ID is an actual URL pointing at a Thing, for instance). So I think the second alternative above, using a named "view" query parameter, is safer. BTW both "version" and "variant" have special meanings for UUIDs so I think we should avoid those terms to avoid confusion.

mmccool commented 2 years ago

Note: fragment identifiers on DIDs may actually make some kind of sense if (a) they refer to the DID document (b) the DID document lists all the variant TDs ("views") (c) we can use a fragment id to extract the one we want. This does however require the DID document to be updated whenever we "publish" a new view, however, which seems like a nuisance, and doesn't work anyway if we are not using DIDs to refer directly to TDs but to directories.

I still think in other contexts fragment identifiers are not quite the right thing to do and a query parameter would be safer. Ideally we would track down a best practices recommendation in this area (e.g. for UUIDs), though.

benfrancis commented 2 years ago

Maybe the purpose of having this semantic type was to indicate that all id URI variables in the given directory TD are of the same type (@benfrancis, could you clarify?)

Yes the only reason for this annotation is to give Consumers a hint that the id uri variable can be used to identify a thing across different interaction affordances, e.g. the same id used to create a Thing can be used to remove it. There's otherwise no way to express this kind of relationship in a Thing Description. Hopefully we don't have to rely on Consumers correctly interpreting this hint though, since they'll be implementing what it says in the prose of the specification anyway.

is the id the id of the Thing or of the TD?

I can't help but feel this whole discussion is getting hung up on RDF semantics and overcomplicating something which is really quite simple. Every Web Thing is either a virtual thing (where a physical Thing doesn't really exist) or a virtual abstraction of a physical Thing in the real world. Either way a Consumer can only ever interact with that virtual Web Thing abstraction, not the atoms of the physical object in the real world. Therefore as far as a WoT Consumer is concerned, the Web Thing is the Thing and the Thing Description is a JSON representation of that Thing on the web. I can't think of any practical reason that you would need to be able to identify the Thing, Web Thing or Thing Description separately.

Yes, you could have two different Thing Descriptions for the same Thing each with its own URL, but as far as a WoT Consumer is concerned they should be treated as two separate Web Things. The fact that they describe the same physical device is really irrelevant at this layer of abstraction. If you have two identical web pages which are hosted from two different URLs then as far as a web browser is concerned they are two different web pages. The fact that two web pages contain the same information or describe the same subject matter doesn't make them the same web page. Granting a permission to one origin does not grant it to another. Being authorised to access one web page does not mean you are authorised to access the other. Caching one of the pages does not cache the other.

We actually have this situation in WebThings since for practical reasons every Web Thing hosted by WebThings Gateway has two separate URLs: a local one (e.g. http://gateway.local/things/foo) and a remote one (https://bar.webthings.io/things/foo). This URL is used as the id member of the Thing Description and is considered to be the URI of both the Thing Description and the Web Thing it describes. But as far as an external Consumer is concerned http://gateway.local/things/foo and https://user.webthings.io/things/foo are two completely separate entities since they have different URIs and come from two separate origins.

for various reasons we should avoid making the ID used to look up a record in a directory necessarily the same as the value of the "id" field in each record. These reasons include:

I personally also think this is overcomplicating things and it's fine if a directory implementation wants to use the IDs from Thing Descriptions as identifiers in the directory. There are workarounds for edge cases.

egekorkan commented 2 years ago

Interesting find at architecture document at https://w3c.github.io/wot-architecture/#intermediaries:

An identifier in the WoT Thing Description MUST allow for the correlation of multiple TDs representing the same original Thing or ultimately unique physical entity.

It is even a normative assertion, yay!

farshidtz commented 2 years ago

Interesting find at architecture document at https://w3c.github.io/wot-architecture/#intermediaries:

An identifier in the WoT Thing Description MUST allow for the correlation of multiple TDs representing the same original Thing or ultimately unique physical entity.

It is even a normative assertion, yay!

Good catch. However that assertion itself is ambiguous. First, it says "an identifier" so it could be an arbitrary field such as the ThingID suggested by Andrea and not necessarily the id. Second, "allow for the correlation" could be implemented by different means such as a prefix as Michael's example above. The identifiers don't have to be identical to allow correlation.

mmccool commented 2 years ago

Discussion (from 8 Nov meeting)

Assertion pointed out by Ege and Farshid is ambiguous/incomplete. How, exactly, do we "correlate" multiple TDs for the same Thing?
@type "Thing" being used for "Thing Descriptions" is unfortunate but probably can't be changed now without breaking 1.0 compatibility.
The proposal using an "id extension" is aligned with using the id as the id of the TD, consistent with Ben's view that each TD is a unique Web Thing. The additional functionality is just a way to extract the id of the common thing being referenced... in other words, to "correlate" the TDs as suggested in the assertion noted. However, using a syntax like "/{viewid}" might be better. We also need to check what is allowed in URNs: https://datatracker.ietf.org/doc/html/rfc8141. They do seem to define their own way to define parameters but these may not be understood by RDF processors, so...

vcharpenay commented 2 years ago

It seems to me that this discussion is based on wrong premises.

TL;DR: TD.id identifies a Thing, not a TD, and that is in line with Semantic Web best practices.

The TD.id (urn:dev:ops:32473-WoTLamp-1234) is the @id of the node, i.e. the node identifier. In other words, the TD.id is the identifier of the TD (as per JSON-LD spec)

The first sentence is right but I don't see how you get to conclude that id identifies the TD, @farshidtz. It is quite common (in the Semantic Web) to have a description of a real-world entity contained in a resource with a different URI (a Web document).

There are guidelines for that, "Cool URIs for the Semantic Web", observing that:

using URIs, it is possible to identify both a thing (which may exist outside of the Web) and a Web document describing the thing.

What these guidelines specify are 2 ways to make the distinction:

by using hash URIs for real-world entities within a Web document
by using 303 redirection from a real-world entity to a Web document

One example of each:

on the BBC Things dataset, Tim Berners-Lee is identified as https://www.bbc.co.uk/things/2166d5db-3cd1-4d8a-a066-bddb220ef216#id (note the #id hash). There's no JSON-LD representation of the person but you can note that the Turtle file doesn't mention the Web page (URI without hash).
on Wikidata, when you dereference http://www.wikidata.org/entity/Q80 (Tim Berners-Lee again), you get redirected to https://www.wikidata.org/wiki/Special:EntityData/Q80.jsonld. That JSON-LD document only describes the person and not the document.

In some occasion, it is useful to describe both entities in the Web document, that's why schema.org has schema:mainEntityOfPage. FOAF also has foaf:isPrimaryTopicOf. With these terms, one could add the following to a TD:

{
  "id": "urn:dev:ops:32473-Thing-123",
  "schema:mainEntityOfPage": ""
}

where "" is the base URI of the document, i.e. most likely a HTTP URI that the Consumer can dereference (unlike URNs). For instance, the base URI could be http://example.org/thing/directory/{some_uuid}.

In fact, I used this mechanism in an early TDir implementation: metadata about the TD (such as registration time and publisher) is stored in a separate document, which links to the TD with foaf:primaryTopic (inverse property of foaf:isPrimaryTopicOf). I used the W3C DCAT vocabulary, which encourages that design.

vcharpenay commented 2 years ago

I can't think of any practical reason that you would need to be able to identify the Thing, Web Thing or Thing Description separately.

@benfrancis it's not obvious to me how a Web Thing and a TD are different entities. In the Web Thing API, the term "Web Thing" is never used alone and what is being specified is a (Web) Thing Description.

As far as I'm concerned, I can conceptualize WoT with the concepts of "Thing" and "TD" alone. Where can the concept of "Web Thing" help? (This isn't a rhetorical question.)

benfrancis commented 2 years ago

@vcharpenay wrote:

it's not obvious to me how a Web Thing and a TD are different entities. In the Web Thing API, the term "Web Thing" is never used alone and what is being specified is a (Web) Thing Description.

That's correct, in WebThings we don't distinguish between the two. The Thing Description is a JSON representation of the Web Thing resource and both are identified by the same URI.

Just for the purposes of this conversation about what id actually identifies I was trying to make the distinction between:

Thing - A physical object in the real world (e.g. a Zigbee smart bulb)
Web Thing - a virtual model of the the physical object in software (e.g. A Node.js web application which exposes the capabilities of the device as a REST API)
Thing Description - JSON metadata about the capabilities of the Web Thing

Using the terminology above, the WebThings WoT consumer implementation only cares about Web Things (of which a Thing Description is one representation), they don't care about whether that Web Thing represents a physical Thing or not. The same URI (e.g. https://foo.webthings.io/things/abc-123) is used for:

The URL of the HTML page that a user views to interact with the Web Thing (when text/html is requested in an HTTP Accept header)
The URL used to retrieve the Thing Description (when application/json is requested in an HTTP Accept header)
The id member in the Thing Description (which didn't even used to be there since it was redundant, but we added at some point when id was a mandatory member of the W3C WoT Thing Description and we were trying to align)

I get that some people might want to include an identifier of the real physical object (such as a serial number) in a Thing Description too, but maybe that should just be additional product metadata, not the id member?

I'm not going to try and argue against any of the rationale regarding best practices on the semantic web, I'm sure you're right. But please understand I'm speaking from the point of view of the WebThings implementation which doesn't parse Thing Descriptions as RDF, it parses them as plain JSON and doesn't use any semantic web technologies.

I understand that other implementations do parse Thing Descriptions as JSON-LD/RDF and that it has to make sense to them too, but I would be disappointed if that means we have to have three separate URIs for:

The URL of the Thing Description
The id member inside the Thing Description
The ID of an entry in a Thing Directory

From my point of view ideally all three could be the same URI, but I accept that in some implementations 1 and 2 are already different.

This whole conversation seems to have been sparked by me adding "@type": "ThingID" in certain places in the Thing Description for the Directory Service API. That was just a hack (given the Thing Description otherwise lacks the necessary semantics) to denote that certain URI variables are IDs which can be used to identify (Web) Things across interactions, it wasn't meant to imply anything else.

vcharpenay commented 2 years ago

OK, thanks. Isn't a Web Thing a (particular case of) Servient, then?

a virtual model of the the physical object in software

vs.

A software stack that (...) can host and expose Things

vcharpenay commented 2 years ago

I'm speaking from the point of view of the WebThings implementation which doesn't parse Thing Descriptions as RDF, it parses them as plain JSON and doesn't use any semantic web technologies.

Sure. Most implemetations do see TDs as plain JSON, I'm sure. But to me, the question of identification is orthogonal to whether a TD is parsed as JSON or RDF. It pertains more to the general architecture of the Web and URIs.

An id must be a URI, as specified in the TD model. This design choice might have been influenced by how real-world entities are identified on the SemWeb but, as far as I remember, it was also influenced by IETF work on identifying connected devices in a network (e.g. with the dev URN namespace, as in the introductory examples of the TD specification).

When a TDir uses this id to create a resource identified by /things/{id} (as specified in section 6.2.2.1.1 Creation), then there already is 2 distinct URIs : the id itself and the URI owned/managed by the TDir at /things/{id}. There is no way these two URIs can be the same, by construction.

I'd therefore suggest to keep these 2 identifiers and to clarify somewhere in the WoT discovery specification what they identify (Thing vs. TD) and how to use them.

benfrancis commented 2 years ago

@vcharpenay wrote:

to me, the question of identification is orthogonal to whether a TD is parsed as JSON or RDF. It pertains more to the general architecture of the Web and URIs.

My take on this is that the web resource is the Web Thing and a Thing Description is one representation of that resource. It may have other representations like HTML. The Web Thing, Thing Description and HTML web page can/should therefore have the same URI.

When adding a Web Thing to a directory however, things are a bit different since you are really creating a new web resource...

An id must be a URI, as specified in the TD model. This design choice might have been influenced by how real-world entities are identified on the SemWeb but, as far as I remember, it was also influenced by IETF work on identifying connected devices in a network (e.g. with the dev URN namespace, as in the introductory examples of the TD specification).

When a TDir uses this id to create a resource identified by /things/{id} (as specified in section 6.2.2.1.1 Creation), then there already is 2 distinct URIs : the id itself and the URI owned/managed by the TDir at /things/{id}. There is no way these two URIs can be the same, by construction.

Ah, OK. I see the problem.

For the record, the way that WebThings Gateway's directory API (which predates WoT Discovery) deals with this is to treat the Thing Description hosted in the gateway's directory as a new Web Thing with a new ID.

For example:

The user adds an existing Web Thing to the gateway by its Thing Description URL https://mythingserver.com/abc-123
The gateway retrieves that Thing Description and generates a new Thing Description based on its contents, at a new URL like https://foo.webthings.io/things/https---mythingserver.com-abc-123, which is also used as the id member of the new Thing Description

Other directories may well work differently to this because WebThings Gateway isn't just a directory which provides a repository of Thing Descriptions, it's a gateway which proxies Web Things to another network and creates a whole new Thing Description with a new URL, new ID and even a new set of Forms with new endpoint URLs. This means that a Web Thing on the local network like https://lamp.local can be bridged to the internet at a new URL like https://foo.webthings.io/things/http---lamp.local for example.

The above is why I say that ideally the URL of the Thing Description, the id member inside the Thing Description and the ID of an entry in a Thing Directory could all be the same URI. But I recognise that in some implementations that won't be true, e.g. because the id member is a URN.

As far as the original topic of this issue is concerned, the important thing is that a client of the Directory Service API knows that the URL specified in the PUT request to add a Thing to the directory can then later be used to retrieve (GET), update (PUT/PATCH) or delete (DELETE) the same Thing. The example Thing Description provided in Section 8.2 is non-normative and the "@type": "ThingID" annotations could simply be removed if they are causing confusion since their meaning isn't really defined anywhere. The normative behaviour should be defined with assertions in the prose of the specification.

mmccool commented 2 years ago

So to follow up on this, we decided to remove the arch-id-correlation assertion (see wot-architecture PR #768). At this point we probably have to defer a resolution to this to TD 2.0/Discovery 2.0, so I'm going to keep the issue open but mark it as deferred. Eventually we should follow up on some of the ideas above, i.e. using fragment identifiers to distinguish TD variants, follow best practices for RDF and the web URIs, etc.

mmccool commented 2 years ago

I'll put back on the "Resolve by CR" label for now just so it will come up when we review these in a future Discovery call, when I hope we can confirm the decision to defer.

mmccool commented 2 years ago

OK, I'm not sure what the conclusion is here. Let's decide soon!

mmccool commented 2 years ago

Discussion: confirm decision to defer (discovery call 2022.9.5).