json-schema-org / json-schema-vocabularies

Experimental vocabularies under consideration for standardization
53 stars 9 forks source link

Validating URIs with URI-Templates #50

Open devinbost opened 1 year ago

devinbost commented 1 year ago

I noticed that JSON Schema currently provides a mechanism to express URI values, as well as URI-reference and URI-template values. However, there is limited functionality for the validation of URIs. Currently, URIs can be validated via regex. However, regex matching can be an expensive operation, and it's not ideal for performing complex matching within a particular framework because complex regex patterns can be difficult to read and hard to maintain and debug.

RFC 6570 provides a rich framework for expressing patterns within URIs with greater structure than regex makes easy to do. However, JSON Schema doesn't currently provide a vocabulary for indicating if a URI-template expressed in a schema is intended for validation of a particular URI. This level of validation could allow JSON Schema to detect invalid URIs, unexpected (and potentially malicious) web requests, and URIs that contain unknown paths (such as where parts of a URI are not in a set of expected enum values.)

I propose that JSON Schema adopt a mechanism to allow users to express a URI-template for validating a URI or URI-reference contained in that (or perhaps in another) schema.

awwright commented 1 year ago

I know a little bit about URI Templates, here's my work on this topic: https://awwright.github.io/uri-template-router/ https://www.youtube.com/watch?v=cq_uQFf5bro

The major problem is you can't narrow down the range of valid values very easily. With a URI Template like http://example.com/{id}, a URI like http://example.com/ is valid (a blank "id" produces that URI). It's more straightforward to write a regular expression that's a subset of URIs. And you can't limit variables to patterns like "positive integers only."

In contrast, a regular expression can match anything that a URI Template can, and more.

Finally, I think this inherently has limited use. URIs are supposed to be defined and managed by the server. So a server could use this in an API to describe its own namespace to a client, but using this for any other purpose (requiring that a URI be for a different server) could be problematic.

karenetheridge commented 1 year ago

We've talked about adding keywords to access various parts of a URI... e.g. "uriHost": { "pattern": "..." } would specify the schema that the host component of a URI (assuming the current data instance is a valid uri string) must conform to.