w3c / wot-profile

Web of Things (WoT) Profile
http://w3c.github.io/wot-profile/
Other
16 stars 8 forks source link

Revisit length limits #200

Open mlagally opened 2 years ago

mlagally commented 2 years ago

We already have some length limits in the data model that were under discussion.

Other description languages have similar constraints that illustrate some practical constraints imposed by typical devices, e.g.: https://github.com/Azure/opendigitaltwins-dtdl/blob/master/DTDL/v2/dtdlv2.md

egekorkan commented 2 years ago

DTDL has the same problem that these limits seem to be defined à la carte but not grounded to some specific requirement. I still have issues with these data limits. Why 64 characters and not 128? They are both multiple of 2...

mlagally commented 2 years ago

@egekorkan we may not be able to identify the reasons for each length limit, however it is still a fact that all real world platforms have constraints.

Please consider that the mindset for defining a profile is very different to the mindset for creating the TD. Where as the TD MUST NOT impose restrictions on what can be described (brown field), the Profile MUST make choices that restrict and constrain in a pragmatic way.

If we want to interoperate with devices that are described in DTDL, we have to take these constraints into account when we define length limits.

egekorkan commented 2 years ago

I agree that all real-world platforms have constraints. However, so far I have not seen anyone in the Profile TF doing a study on what these constraints are, nor do I see any paper cited in the spec that makes such a study. If we are going to set a limit, let's go with 2005; Mars Reconnaissance Orbiter was launched that year.

If we choose to do study and find that 95% of the platforms use a limit that is between 64 and 256, which one do we take? Taking the lower limit will exclude consuming TDs that have 256 char length descriptions, taking the higher limit will mean that consumers (platforms) of the lower limit will not be able to consume TDs that have 256 char length descriptions.

If we want to interoperate with devices that are described in DTDL

This is not the goal of a profile meant for greenfield devices. If you want to interoperate with any sizeable variety of existing devices, you cannot do this with a profile, you will need TDs. That is the whole goal of descriptive WoT.

If we are going to align a baseline profile with DTDL, that should be called DTDL profile, which is a binding in the end...

mlagally commented 2 years ago

I think we should not overcomplicate the discussion.

The profile should make reasonable choices and we have data points from the real world, i.e. from plug fests and from other relevant specifications.

Yes, we can start again in an ivory tower and use an arbitrary value such as 2005, but this will endanger adoption if it is chosen without further consideration.

mlagally commented 2 years ago

Some references for well known limits: https://www.ibm.com/docs/en/db2/10.5?topic=sql-xml-limits https://docs.oracle.com/en/database/oracle/oracle-database/19/refrn/datatype-limits.html https://chartio.com/resources/tutorials/understanding-strorage-sizes-for-mysql-text-data-types/

mlagally commented 2 years ago

Arch call on 25.5.: Need to consider several concrete implementations in products and systems with wide adoption where we want to achieve round tripping. Candidates include:

Need to point out that WoT by itself is not constrained, but that we target interop with existing platforms with these limits.

What do we have to limit? - key lengths ? values ? TD length?

benfrancis commented 2 years ago

A real life example I came across today for a title member: "Urban Sciences Building: Floor 1: Room 1.024 Zone 3". That's 51 characters long (longer if translated into German!), and dangerously close to the currently proposed 64 character limit.

I don't like the approach of picking arbitrary limits for the lengths of fields because given that the limits of underlying technologies described above are often user-configurable and subject to change, whatever number you pick is going to exclude some outliers. If there is no perfect solution that will always be guaranteed to work, then for the purposes of interoperability at the API level it may actually be better to explicitly allow these limits to be implementation specific, rather than give a false impression of universal interoperability.

If there absolutely have to be arbitrary limits then they should be much higher than the current proposal, with a note that they do not guarantee interoperability.

A separate CoAP profile targeting constrained devices could be more strict about these kinds of things on the basis of meeting requirements related to resource constraints, as opposed to interoperability.

egekorkan commented 2 years ago

My initial quick research. For other contributions, please make sure to follow a similar structure and make sure to put references

Programming languages

tl;dr: The limits are generally huge and should not be considered in my opinion

Databases

Data Formats / Standards / Platforms

mlagally commented 2 years ago

@benfrancis

I don't like the approach of picking arbitrary limits for the lengths of fields because given that the limits of underlying technologies described above are often user-configurable and subject to change, whatever number you pick is going to exclude some outliers. If there is no perfect solution that will always be guaranteed to work, then for the purposes of interoperability at the API level it may actually be better to explicitly allow these limits to be implementation specific, rather than give a false impression of universal interoperability.

If there absolutely have to be arbitrary limits then they should be much higher than the current proposal, with a note that they do not guarantee interoperability.

A separate CoAP profile targeting constrained devices could be more strict about these kinds of things on the basis of meeting requirements related to resource constraints, as opposed to interoperability.

I think we are coming from different angles:

  1. "best effort interop" This is the approach where we do not impose any concrete limits on profile-compliant implementations. In these cases some implementations will work together well, if they share common constraints, others won't. If one thing uses limits that exceed the capabilities of another device, it is up to the implementation on that device how the situation is handled. There are many ways of doing that:

    • crashing
    • buffer overflows with potential security implications
    • truncation with potential name collisions These cases will happen.
  2. guaranteed interop among constrained devices In this case we have concrete limits and a well specified and guaranteed behavior of profile compliant things. Conformance rules can statically validate TDs (identifier and value lengths) and implementations implement truncation rules.

I believe we have to address the latter, perhaps we can reach consensus on a dedicated "constrained profile", which provides these guarantees.

mlagally commented 2 years ago

@egekorkan Thanks very much for doing this research and providing the references. Beyond looking into size limits of strings, we have to go a bit deeper, i.e. we have to make some fundamental assumptions on the processing of data and its implementation. Typical consumers of things will use interaction affordances to read properties, invoke actions, process results, receive event streams. Data and metadata that is provided by the thing will be processed and stored. Applications are typically hosted on WebServers, these impose limits, see for example: https://docs.microsoft.com/en-us/iis/configuration/system.webserver/security/requestfiltering/requestlimits/

There are many ways to store data, from flat files with CSV, JSON, XML, YAML and other formats to relational databases, time series databases, NoSql databases, object storage and others. This data will be searched, indexed, processed by search and AI/ML components. All these implementations have limits on key names, value ranges and lengths. See for example: https://www.elastic.co/guide/en/app-search/current/limits.html https://www.mongodb.com/docs/manual/reference/limits/

When we look at other standards, many of them define reasonable limits: OneM2M: https://www.etsi.org/deliver/etsi_ts/118100_118199/118108/01.00.00_60/ts_118108v010000p.pdf CoAP: https://datatracker.ietf.org/doc/html/rfc7252

From a practical perspective, we cannot assume that consumers that handle devices from multiple vendors is reconfigured for a single device that exceeds limits.

When these applications were implemented, people made reasonable choices about maximum values. We should be pragmatic and identify common ground and define a set of reasonable constraints.

benfrancis commented 2 years ago

perhaps we can reach consensus on a dedicated "constrained profile", which provides these guarantees.

Yes, as I have said before I think it would be reasonable to have a profile which is specifically for constrained devices.

The current HTTP-based profiles are simply not well suited to resource constrained devices like small microcontrollers, they are more suited to environments like cloud services and smartphone apps. In these environments it's extremely unlikely that for any genuine use case the lengths of strings in Thing Descriptions are going to be a limiting factor. In the case of malicious Thing Descriptions which include gigabyte sized strings with the intention of overwhelming a system or exposing a security flaw for example, JSON parsers and databases already need to guard against such edge cases. We probably need to be more worried about other attack vectors like denial of service attacks which overwhelm a device endpoints with lots of requests, or a Consumer accidentally DDoSing itself by subscribing to very high frequency events using WebHooks.

For very constrained devices where constraints like 64 vs. 128 byte strings inside Thing Descriptions might actually be important, my recommendation is to define a separate profile which uses CoAP+CBOR instead of HTTP+JSON. That profile can meet requirements around resource constraints which an HTTP+JSON profile simply isn't suited to.

benfrancis commented 2 years ago

See also: #59.

mlagally commented 2 years ago

Deferred.

lu-zero commented 4 months ago

I think that the degraded consumption topics in this issue may be addressed in a TD NOTE suggesting some strategies on how to deal with fields or whole documents too large to fit the memory budget.

I think no-heap targets (e.g. https://github.com/sammhicks/picoserve uses https://docs.rs/heapless/latest/heapless/) would enjoy having the constraint information readily available, so it might be useful to deliver it either as a profile or by other means.

I guess we could look at https://www.w3.org/TR/1998/NOTE-compactHTML-19980209/ for an historical precedent.