MQTT URI Design - Githubissues

egekorkan commented 1 year ago

We have agreed before on using the href for indicating the resource but it is not done in MQTT yet. We should come up with a URI Scheme design that also allows topics starting with /.

broker: broker.com port: 1880 topic: /topiclevel1/topiclevel2

Some ideas:

mqtt://broker.com:1880:/topiclevel1/topiclevel2
mqtt://broker.com:1880/:/topiclevel1/topiclevel2
mqtt://broker.com:1880?/topiclevel1/topiclevel2
mqtt://broker.com:1880*/topiclevel1/topiclevel2

We should read up on the URI/IRI syntax to make sure to not use invalid syntax.

lu-zero commented 1 year ago

We can refer to

relu91 commented 1 year ago

Talked offline about MQTT URIs. Points:

From RFC3986 we have constraints on how the path component starts. In particular, it cannot start with something else than /.
This implies that the options (1,3,4) above are not valid URIs.
One design solution that we explored is using URI encoding for MQTT topics. One example is the following URI: mqtt://example.com:1900/%2Fone-topic (note %2F translates to /)
Later on, we found out that most URI parser accepts double starting / without normalizing, so this URI is correct: mqtt://example.com:1900//topic-level-0/topic-level-1/
Note that in MQTT Topic Names are hierarchical it is ok to treat them as hierarchical resources.
Another discussion is a Topic Filter -> which is rather a query than an Identifier of the resource. One possible design is to use the query parameter of URI: mqtt://example.com:1900/?/topic-level-0/topic-level-1/+. Keep in mind that in this case # needs to be escaped. In this case, we should disallow having path components when the query is defined. It would be ambiguous because following the URI semantic this mqtt://example.com:1900//topic-level-0/topic-level-1/?/topic-level-0/topic-level-1/+ means search /topic-level-0/topic-level-1/+ inside /topic-level-0/topic-level-1/ while in MQTT is actually a query for the whole broker.
Keep in mind that MQTT allows multiple TopicFilter per SUBSCRIBE packet. Useful when we want to do a readalloperation or subscribealloperation
Note that Topic Names and filters are case-sensitive (this is usually not true for HTTP but it should not cause a problem for other WoT-aware clients).

There is still some discussion to do, but I hope I captured everything. Credits: @egekorkan and @lu-zero.

lu-zero commented 1 year ago

Let me add that:

the wildcard # MUST be the ending of the query string, so it fits the uri specification just fine.
we can decide if we support multiple queries or not in the affordance forms, but I'd rather avoid it until we have a better way to model a connected protocol.

Some examples from the mqtt spec on topic names fed to a a commonly used rust parser here:

input mqtt://host:1234/sport/#:
    paths ["sport", ""] frag true
input mqtt://host:1234/#:
    paths [""] frag true
input mqtt://host:1234/sport/tennis/#:
    paths ["sport", "tennis", ""] frag true
input mqtt://host:1234//sport/tennis/#:
    paths ["", "sport", "tennis", ""] frag true
input mqtt://host:1234/+:
    paths ["+"] frag false
input mqtt://host:1234/+/tennis/#:
    paths ["+", "tennis", ""] frag true
input mqtt://host:1234/$SYS/monitor/+:
    paths ["$SYS", "monitor", "+"] frag false

This way if we map 1 query or topic to 1 form we can fit everything that is in mqtt as a plain uri w/out the need of escaping elements beside . and .. as single path elements.

lu-zero commented 1 year ago

While looking for how the languages support empty fragments, we found that the URL Api in JS has a problem, I already reported it upstream https://github.com/whatwg/url/issues/779.

relu91 commented 1 year ago

Continuing the discussion about whether having an ending # character in the MQTT URI is fine. As @lu-zero explained syntactically speaking, nothing probits adding # at the ending of the URI. However, it creates semantic interpretation problems. For example the URI: mqtt://example.com:1900/?/topic-level-0/topic-level-1/# should be interpreted, accordingly to the RFC, as "The secondary resources inside the root namespace of example.com:1900 that match the ?/topic-level-0/topic-level-1/ query, identified by an empty fragment".

I report some statements about the meaning of the fragment identifiers here for reference:

The fragment identifier component of a URI allows indirect identification of a secondary resource by reference to a primary resource and additional identifying information. The identified secondary resource may be some portion or subset of the primary resource, some view on representations of the primary resource, or some other resource defined or described by those representations. .... The semantics of a fragment identifier are defined by the set of representations that might result from a retrieval action on the primary resource.

Even if we are under our custom scheme mqtt we cannot change this interpretation, without breaking outside the framework painted in the RFC. Therefore, given the current URI design, the only option that we have is to encode the special characters like: mqtt://example.com:1900/?/topic-level-0/topic-level-1/%23.

Unfortunately, this approach is very uncommon for MQTT users as they are used to writing topic filters without escaping. Plus, as suggested offline by @egekorkan, usually:

You encode to send it over the wire, here are doing the opposite (encode it for td, decode it when sending over the wire)

lu-zero commented 1 year ago

I would not use the query component, you cannot do anything else when you subscribe or unsubscribe I think.

mahdanoura commented 1 year ago

I think in an MQTT URI scheme, the question mark is not really required as we don't have query parameters. Also, I totally agree with @relu91 that we can't simply use the "#" at the end of the MQTT URI to denote a multi-level wildcard as this would violate with its reservered used. According to the RFC, the hash character is reservered and used to point to a specific section within a resource.

I think, we have to either encode it, but as @relu91 and @egekorkan mentioned it is not common to do so for the MQTT community. Or Instead, we need to introduce an additional segment (e.g., "wildcard-segment") to represent the multi-level wildcard, a placeholder like "" e.g., `mqtt://example.com:1883/topic/level1//level2`, which is also not what the community is acquinted with and again requires decoding before sent to the broker.

sebastiankb commented 1 year ago

I just wondering why we not simple use the fragment part to share the MQTT topic: E.g. mqtt://<Broker IP>:<port>#<topic>

I played around with

const parse = require('url-parse');
const validator = require('validator');
let uris = ['mqtt://mybroker.com:1883#/path/+/path/*/#', 'mqtt://mybroker.com:1883#$SYS/monitor/+']

for(let uri of uris) {
    let parsed = parse(uri);
    console.log(parsed.href);
    console.log(" is valid URL?: "+ validator.isURL(uri, { protocols: ['mqtt']}));
    console.log(" protocol: "+parsed.protocol);
    console.log(" hostname: "+parsed.hostname);
    console.log(" port: "+parsed.port);
    console.log(" pathname: "+parsed.pathname);
    console.log(" hash: "+parsed.hash);
}

Sample output:

mqtt://mybroker.com:1883#/path/+/path/*/#
 is valid URL?: true
 protocol: mqtt:
 hostname: mybroker.com
 port: 1883
 pathname: 
 hash: #/path/+/path/*/#
mqtt://mybroker.com:1883#$SYS/monitor/+
 is valid URL?: true
 protocol: mqtt:
 hostname: mybroker.com
 port: 1883
 pathname: 
 hash: #$SYS/monitor/+

The URI deserialiser of the MQTT binding just has to throw away the # at the beginning to get the origin topic. Did I miss something?

lu-zero commented 1 year ago

I think in an MQTT URI scheme, the question mark is not really required as we don't have query parameters. Also, I totally agree with @relu91 that we can't simply use the "#" at the end of the MQTT URI to denote a multi-level wildcard as this would violate with its reservered used. According to the RFC, the hash character is reservered and used to point to a specific section within a resource.

It is not a violation, we can just state "If fragment is present use it as wildcard".

I think, we have to either encode it, but as @relu91 and @egekorkan mentioned it is not common to do so for the MQTT community. Or Instead, we need to introduce an additional segment (e.g., "wildcard-segment") to represent the multi-level wildcard, a placeholder like "" e.g., `mqtt://example.com:1883/topic/level1//level2`, which is also not what the community is acquinted with and again requires decoding before sent to the broker.

Also the # wildcard must be last:

The number sign (‘#’ U+0023) is a wildcard character that matches any number of levels within a topic. The multi-level wildcard represents the parent and any number of child levels. The multi-level wildcard character MUST be specified either on its own or following a topic level separator. In either case it MUST be the last character specified in the Topic Filter

mahdanoura commented 1 year ago

Also the # wildcard must be last:

The number sign (‘#’ U+0023) is a wildcard character that matches any number of levels within a topic. The multi-level wildcard represents the parent and any number of child levels. The multi-level wildcard character MUST be specified either on its own or following a topic level separator. In either case it MUST be the last character specified in the Topic Filter

I understand that # should be at the end, but what I am saying is that it could be only syntactically correct to use the fragment part in a URI as the wildcard, and not semantically correct. They serve different purposes. Wildcards are a flexible mechanism for matching patterns to subscribe to resource representations, while a fragment does not serve the same purpose. Also, a fragment is never sent to the server with the request.

lu-zero commented 1 year ago

The browser semantics aren't binding to our use of url/uri/iri :)

relu91 commented 1 year ago

The browser semantics aren't binding to our use of url/uri/iri :)

As I reported above, in my understanding this is not semantics of browsers but it is explained in the RFC itself:

The fragment identifier component of a URI allows indirect identification of a secondary resource by reference to a primary resource and additional identifying information. The identified secondary resource may be some portion or subset of the primary resource, some view on representations of the primary resource, or some other resource defined or described by those representations. .... The semantics of a fragment identifier are defined by the set of representations that might result from a retrieval action on the primary resource.

See also in the introduction, that we bound to that interpretation of the fragment:

This specification defines those elements of the URI syntax that are required of all URI schemes or are common to many URI schemes. It thus defines the syntax and semantics needed to implement a scheme- independent parsing mechanism for URI references, by which the scheme-dependent handling of a URI can be postponed until the scheme-dependent semantics are needed.

So I second the points made by @mahdanoura (regardless of the design we are going to choose - using a query or just a path). Just one clarification:

Also, a fragment is never sent to the server with the request.

That is another good point. section 3.5 exampling also this concept of URI Fragment being a "client-side indirect referencing" mechanism.

relu91 commented 1 year ago

We had an offline discussion with @Jerady and @skobow from HiveMQ. I'm trying to summarize the discussion:

We commented on and evaluated all the suggestions above
We evaluated also a design where we remove the # and use a dedicated query parameter (e.g. wildcard), however, the encoding approach felt better.
It is important to let developers use standard tools for URI parsing
Encoding the # is bad for human readability but should not harm developer experience because they have the means to encode it (they might need to encode that part anyway because of other characters)
We briefly discussed what is a resource for mqtt, even if we didn't point to any conclusive answer.
Anything missing point that I've not captured? @lu-zero @sebastiankb @egekorkan

skobow commented 1 year ago

Here are some concrete Java code samples comparing using url encoded # vs evaluating URI fragment:

Evaluating fragments:

final String mqttUrl = "mqtt://mybroker:1883/my/topic/#";

final URI uri = URI.create(mqttUrl);
final String scheme = uri.getScheme();
final String host = uri.getHost();
final int port = uri.getPort();
final String path = uri.getPath().replaceFirst("/", "");
final String fragment = uri.getFragment();

// evaluate if fragment is present and append multilevel wildcard if required
final String topic = fragment != null ? path + "#" : path;

final Mqtt5AsyncClient mqtt5Client = Mqtt5Client.builder()
        .serverHost(host)
        .serverPort(port)
        .build().toAsync();

mqtt5Client.subscribeWith()
        .topicFilter(topic)
        .send();

Using url encoded character:

final String mqttUrl = "mqtt://mybroker:1883/my/topic/%23"; // multilevel wildcard as such is not human readable

final URI uri = URI.create(mqttUrl);
final String scheme = uri.getScheme();
final String host = uri.getHost();
final int port = uri.getPort();
final String path = uri.getPath().replaceFirst("/", ""); // evaluates to 'my/topic/#'

final Mqtt5AsyncClient mqtt5Client = Mqtt5Client.builder()
        .serverHost(host)
        .serverPort(port)
        .build().toAsync();

mqtt5Client.subscribeWith()
        .topicFilter(path)
        .send();

Even though url encoding is harder to read it is much easier to use and does not require any processing of the path part as it is decoded automatically! From a developer perspective I find using encoded characters much more intuitive and less error prone.

sebastiankb commented 1 year ago

To formalize this:

mqtt://<address>:<port>/<topic>

Where:

    {address} Broker IP address
    {port} Broker's port number
    {topic} MQTT topic, where the MQTT multi-level wildcard character (#) must be URL encoded (%23) when used.

relu91 commented 1 year ago

To formalize this:

mqtt://<address>:<port>/<topic>

Where:

    {address} Broker IP address
    {port} Broker's port number
    {topic} MQTT topic, where the MQTT multi-level wildcard character (#) must be URL encoded (%23) when used.

I propose something more complex but also more flexible and closer to MQTT semantics:

mqtt://<address>:<port>/<topicName>[?<topicFilter1>&<topicFilter2>&<topicFilterN>]

Where:

    {address} Broker IP address
    {port} Broker's port number
    {topicName} MQTT topic
    {topicFilterN} MQTT topic filter, where the MQTT multi-level wildcard character (#) must be URL encoded (%23) when used.

Note: The query parameter can only be used when there is no topic name.

Let me know if you don't like it.

skobow commented 1 year ago

That is how I understand the MQTT spec with regards to Topic Names and Topic Filters (https://docs.oasis-open.org/mqtt/mqtt/v5.0/os/mqtt-v5.0-os.html#_Toc3901241):

A Topic Name identifies the information channel to which Payload data is published. The term "topic name" refers to publishing data.

Topic filters indicate the Topics to which the Client wants to subscribe and can contain wildcards such as the single level wildcard + and the multilevel wildcard #. The term "topic filter" refers to subscribing for data.

So a topic can either be a topic name when publishing data or a topic filter when subscribing for data.

That being said I think your approach @relu91 is mixing up two different concepts in MQTT (publish & subscribe). Further more I think this approach will also increase implementation complexity as topics would need to be re-constructed by iterating over parameters instead of just using the path component of the URI. While the wildcard operator would also need to be encoded as %23 there is also no benefit in terms of human readability.

skobow commented 1 year ago

Fyi: I just found that wiki page from the mqtt.org wiki: https://github.com/mqtt/mqtt.org/wiki/URI-Scheme. It does not tell anything about the topic part though.

relu91 commented 1 year ago

That is how I understand the MQTT spec with regards to Topic Names and Topic Filters (https://docs.oasis-open.org/mqtt/mqtt/v5.0/os/mqtt-v5.0-os.html#_Toc3901241):

A Topic Name identifies the information channel to which Payload data is published. The term "topic name" refers to publishing data.

Topic filters indicate the Topics to which the Client wants to subscribe and can contain wildcards such as the single level wildcard + and the multilevel wildcard #. The term "topic filter" refers to subscribing for data.

So a topic can either be a topic name when publishing data or a topic filter when subscribing for data.

Yes, this is pretty much how I understand the spec too, but perhaps in the design above, I wanted to clearly differentiate the duality of a topic. In my opinion, the path-only approach would confuse because, in practice, you can have a topic filter without any wildcard and then you don't know if you are filtering or using a topic name (although it is pretty clear when you also specify the mqtt packet type).

That being said I think your approach @relu91 is mixing up two different concepts in MQTT (publish & subscribe). Further more I think this approach will also increase implementation complexity as topics would need to be re-constructed by iterating over parameters instead of just using the path component of the URI. While the wildcard operator would also need to be encoded as %23 there is also no benefit in terms of human readability.

Yes, this is the point that I don't like but not because of mixing but rather the fact that we are licking "retival" information into the URI. Basically, it would violate RFC 1.2.2 as if with the http scheme some URIs were only valid if you do a POST. Again, I think in this design effort we should first answer what is a resource for MQTT. Is it the topic? or just the broker namespace? Or are the messages broadcasted by the broker?

@sebastiankb's design is ok, but in my understanding, it is possible to subscribe using multiple topic filters at once that this means that we don't support this use case? Or should we allow for multiple paths in the URI scheme?

Fyi: I just found that wiki page from the mqtt.org wiki: https://github.com/mqtt/mqtt.org/wiki/URI-Scheme. It does not tell anything about the topic part though.

Interestingly enough this goes along the current URI design where topic names and filters are expressed as additional information on the form. Even though I belive it would be better to update that with the proposals that we are making here.

skobow commented 1 year ago

Yes, this is pretty much how I understand the spec too, but perhaps in the design above, I wanted to clearly differentiate the duality of a topic. In my opinion, the path-only approach would confuse because, in practice, you can have a topic filter without any wildcard and then you don't know if you are filtering or using a topic name (although it is pretty clear when you also specify the mqtt packet type).

Ok, got your point here. Using your approach would rather implicitly encode the operation by the structure of the URI. In general the term "topic" is commonly used for topic names as well as topic filters. So I would not expect that some distinction based on these terms would be any obvious.

Yes, this is the point that I don't like but not because of mixing but rather the fact that we are licking "retival" information into the URI. Basically, it would violate RFC 1.2.2 as if with the http scheme some URIs were only valid if you do a POST. Again, I think in this design effort we should first answer what is a resource for MQTT. Is it the topic? or just the broker namespace? Or are the messages broadcasted by the broker?

From my understanding a resource is a leaf in a topic tree referenced by a topic name. In the case of publishing data only on specific resource can be addressed. In case of subscribing multiple resources can be addressed. All that might just apply when the topic structure is build using resource oriented semantics rather than operational like come RPC based approach.

Interestingly enough this goes along the current URI design where topic names and filters are expressed as additional information on the form. Even though I belive it would be better to update that with the proposals that we are making here.

I think it would be great to have MQTT URIs being self containing. That would enable to trigger a certain operation by just passing a URi to a client w/o external dependencies.

In the following I add another possible approach trying to be explicit and respecting MQTT options as well as connection parameters (which we haven't discussed yet at all):

mqtt[s]://[username:password@]hostname:[port]/topic
    ?operation=pub|sub
    [?identifer=clientId]
    [?version=3|4]
    [?qos=0|1|2]
    [?payload=<BASE64 encoded payload>]
    [?retain=true|false]
    [?messageExpiry=3600]
    ... # other MQTT options

Please share your thoughts about that approach.

lu-zero commented 1 year ago

the href field is only to specify the resource, encoding, payload layout, connection options should live in their own separate fields.

sebastiankb commented 1 year ago

@relu91

@sebastiankb's design is ok, but in my understanding, it is possible to subscribe using multiple topic filters at once that this means that we don't support this use case? Or should we allow for multiple paths in the URI scheme?

My understanding is that you are only designing one topic for a specific purpose. For example, if you are addressing a single data source, you would mainly take a path-only approach (e.g., make sense when you have a property in TD context). If you are interested in multiple data sources, you would go with a wildcard (eg., make sense for readAllProperties). For me, both would be defined as "topic". If I understand @skobow correctly, there seems no real distinction between "topic" and "filter".

skobow commented 1 year ago

the href field is only to specify the resource, encoding, payload layout, connection options should live in their own separate fields.

Okay, maybe my knowledge here is still to limited and my thoughts have been to generic. But then of course there is no need to have those options as parameters to the URI.

For me, both would be defined as "topic". If I understand @skobow correctly, there seems no real distinction between "topic" and "filter".

Yes, correctly. Even though you can find this distinction in the specs usually people are are just talking about "topics".

@sebastiankb what is meant with "TD context"?

From what I understood the biggest challenge seems to be deriving the operation from the URI. Right? And as there is no such thing like the request verb in Http another mechanism needs to be defined. In order to keep things simple and stay with a path-only approach having both topic names and topic filters as the path component there would still be the option to define the operation with an operation parameter like I showed above.

So the URI pattern could be like

mqtt[s]://[username:password@]hostname:[port]/topic?operation=pub|sub

This would explicitly name the operation and would ensure easy handling.

final String mqttUrl = "mqtt://localhost:1883/my/sensors/sensorXYZ/temperature?operation=pub";

final URI uri = URI.create(mqttUrl);
final String scheme = uri.getScheme();
final String host = uri.getHost();
final int port = uri.getPort();
final Map<String, String> parameters = getParameters(uri.getQuery());

final String topic = uri.getPath().replaceFirst("/", "");

final var mqtt5Client = Mqtt5Client.builder()
        .serverHost(host)
        .serverPort(port)
        .build()
        .toBlocking();

mqtt5Client.connect();

if ("sub".equals(parameters.get("operation"))) {
        mqtt5Client.subscribeWith()
                .topicFilter(topic)
                .send();
} else if ("pub".equals(parameters.get("operation"))) {
        mqtt5Client.publishWith()
                .topic(topic)
                .payload("25°C".getBytes(StandardCharsets.UTF_8))
                .send();
}

sebastiankb commented 1 year ago

@sebastiankb what is meant with "TD context"?

In WoT we consider 3 abstract interaction models which we call Properties, Actions and Events. Properties are suitable to specify MQTT subscriptions. Actions can be used to describe MQTT publications.

That is the reason why ?operation=pub|sub part would be not necessary in the WoT context. Based on the interaction model, you will know whether you need to subscribe or publish.

relu91 commented 1 year ago

the href field is only to specify the resource, encoding, payload layout, connection options should live in their own separate fields.

Yes exactly as I was pointing out in my previous comment is better to not mix the concepts here. (it is also enforced by the RFC)

Yes, this is the point that I don't like but not because of mixing but rather the fact that we are licking "retival" information into the URI. Basically, it would violate RFC 1.2.2 as if with the http scheme some URIs were only valid if you do a POST

@sebastiankb

My understanding is that you are only designing one topic for a specific purpose. For example, if you are addressing a single data source, you would mainly take a path-only approach (e.g., make sense when you have a property in TD context). If you are interested in multiple data sources, you would go with a wildcard (eg., make sense for readAllProperties). For me, both would be defined as "topic". If I understand @skobow correctly, there seems no real distinction between "topic" and "filter".

Yup, I thought that the MQTT was more used on the concepts described in the Spec, but as I user, I also have to admit that usually the term is mixed up. Then the only thing left to decide is probably how (if) to support the use case when multiple topics need to be used in one MQTT subscribe packet. 🤔 Other than this the simple path-only approach should work.

JKRhb commented 1 year ago

Using MQTT URIs in TDs in practice, I guess you would need to create one form for reading/subscribing and writing/publishing, right? Would this make the additional filter and topic fields obsolete? Or should they still be defined to create a single form that is both usable for reading and writing?

{
  ...
  "properties": {
    "foo": {
      "forms": [
        {
          "op": "writeproperty",
          "href": "mqtt://mybroker:1883//my/topic"
        },
        {
          "op": ["readproperty", "observeproperty"],
          "href": "mqtt://mybroker:1883/my/+/awesome/topic/%23" // <- Not allowed for writing
        },
        {
          "op": ["writeproperty", "readproperty", "observeproperty"],
          "href": "mqtt://mybroker:1883/my/+/awesome/topic/%23", // <- Only used for reading here
          "mqv:topic": "/my/topic" // <- Possible override for writing
        }
      ]  
    }
  }
}

sebastiankb commented 1 year ago

For me, the first two forms entries are very clear. Assigning a special meaning to "mqv:topic" complicates everything about implementation and specification.

In this context, I would simply call filter and topic obsolete terms that should not be used in the future.

ektrah commented 1 year ago

One thing to note is that while paths starting with double slash are valid, they are difficult to use in relative URI references. For example, base URI mqtt://mybroker:1883/ with URI reference /my/awesome/topic yields URI mqtt://mybroker:1883/my/awesome/topic, but mqtt://mybroker:1883/ with //my/awesome/topic (note the double slash) yields mqtt://my/awesome/topic and not mqtt://mybroker:1883//my/awesome/topic.

sebastiankb commented 1 year ago

I need a specific example here: Lets have two topics: "my/first/topic" and "/my/second/topic" and the broker mqtt://mybroker:1883.

In the TD you would use the base for the broker endpoint:

"base" : "mqtt://mybroker:1883/"

A propertyA would describe the first topic and uses

"href" : "my/first/topic"

A propertyB would describe the second topic and uses

"href" : "/my/second/topic"

Applying URI arithmetic to the href of propertyA would result in "mqtt://mybroker:1883/my/first/topic", right?

What happen with propertyB? Would the result look like this "mqtt://mybroker:1883//my/second/topic" or this "mqtt://my/second/topic"?

ektrah commented 1 year ago

Base URI	URI Reference	Resulting URI	Resulting MQTT Topic
`mqtt://mybroker:1883/`	`my/first/topic`	`mqtt://mybroker:1883/my/first/topic`	`my/first/topic`
`mqtt://mybroker:1883/`	`/my/second/topic`	`mqtt://mybroker:1883/my/second/topic`	`my/second/topic`
`mqtt://mybroker:1883/`	`//my/third/topic`	`mqtt://my/third/topic` :warning:	`third/topic`
`mqtt://mybroker:1883/`	`.//my/fourth/topic`	`mqtt://mybroker:1883//my/fourth/topic`	`/my/fourth/topic`

So there is no problem or limitation here, just something to be aware of. (You can't just stick "mqtt://mybroker:1883/" in "base" and put the MQTT topic in "href". "href" really needs to contain a valid URI reference that, together with the base URI, yields the correct resulting URI with the MQTT topic encoded in its path.)

lu-zero commented 1 year ago

here what the specification says regarding resolving base + reference.

Making people aware of this specific pitfall might be good, but I wonder if /this//topic//is///really/valid/in/mqtt.

In any case it seems another item to consider when reworking base and in general when we'll find a way to describe reusable connections during td 2.0 development.

ektrah commented 1 year ago

If I understand correctly, pretty much any Unicode string that doesn't contain a null character is a valid MQTT topic, with only $, /, + and # having special meaning. This means that /this//topic//is///really/valid/in/mqtt, even if it is probably rather unusual.

One difficulty might be that RFC 3986 places special meaning on . and .. path segments (called dot-segments; see Section 3.3) and that these cannot be percent-encoded (see Section 2.3). Thus, the MQTT topic my/../topic could not be expressed as a mqtt:// URI. (mqtt://mybroker:1883/my/../topic would just be normalized to mqtt://mybroker:1883/topic.)

sebastiankb commented 1 year ago

If I understand this correctly, we have an issue here. In MQTT the topic "my/topic" and "/my/topic" addressing different resources. When we want to express "/my/topic" as URI reference in href then the autor needs to write ".//my/topic". This seems to be another exception rule besides the '#' character. Is this still useful?

relu91 commented 1 year ago

Using MQTT URIs in TDs in practice, I guess you would need to create one form for reading/subscribing and writing/publishing, right? Would this make the additional filter and topic fields obsolete? Or should they still be defined to create a single form that is both usable for reading and writing?
{
  ...
  "properties": {
    "foo": {
      "forms": [
        {
          "op": "writeproperty",
          "href": "mqtt://mybroker:1883//my/topic"
        },
        {
          "op": ["readproperty", "observeproperty"],
          "href": "mqtt://mybroker:1883/my/+/awesome/topic/%23" // <- Not allowed for writing
        },
        {
          "op": ["writeproperty", "readproperty", "observeproperty"],
          "href": "mqtt://mybroker:1883/my/+/awesome/topic/%23", // <- Only used for reading here
          "mqv:topic": "/my/topic" // <- Possible override for writing
        }
      ]  
    }
  }
}

@JKRhb I think the whole purpose of the discussion is to get rid of those old form terms and properly use the URI semantic. We could still allow to use them for the same reasons that we allow modbus:unitID -> readability. But it should be clear that there is a one-to-one mapping and URI should be preferred.

relu91 commented 1 year ago

One difficulty might be that RFC 3986 places special meaning on . and .. path segments (called dot-segments; see Section 3.3) and that these cannot be percent-encoded (see Section 2.3). Thus, the MQTT topic my/../topic could not be expressed as a mqtt:// URI. (mqtt://mybroker:1883/my/../topic would just be normalized to mqtt://mybroker:1883/topic.)

I would say that is a problem (even if it is probably a corner case). Good thing we pinpointed here, if we go on in this direction we need to explain the limitations and the topic that cannot be represented.

If I understand this correctly, we have an issue here. In MQTT the topic "my/topic" and "/my/topic" addressing different resources. When we want to express "/my/topic" as URI reference in href then the autor needs to write ".//my/topic". This seems to be another exception rule besides the '#' character. Is this still useful?

Yes, yet another limitation, but less problematic than the one above. Also, we are still not covering resources selected by multiple topics (topic filters).

sebastiankb commented 1 year ago

Ok, I'll summarize the results so far:

mqtt://<address>:<port>/<topic>

Where:

    {address} Broker IP address
    {port} Broker's port number
    {topic} MQTT topic with the following expectations:
            1) There is no topic level name '.' or '..'
            2) A multi-level wildcard character (#) must be URL encoded (%23) when used
            3) If the topic is used as URI reference only, a starting '/' character before 
               the first topic level name has to be replaced by the characters './/'

    Examples:
      - "href":"mqtt://mybroker:1883/my/example/topic"  --> MQTT topic: "my/example/topic"
      - "href":"mqtt://mybroker:1883//my/example/topic" --> MQTT topic: "/my/example/topic"
      - "href":"my/example/topic"                       --> MQTT topic: "my/example/topic"
      - "href":".//my/example/topic"                    --> MQTT topic: "/my/example/topic"
      - "href":"my/example/topic/%23"                   --> MQTT topic: "my/example/topic/#"

w3c / wot-binding-templates

MQTT URI Design #292