Constrain URLs in API Roots

varnerac commented 6 years ago

We need to restrict what values can be in an API Root URL to reduce nondeterminism in URL -> API Root resolution.

I propose the following text:

API Root URLs MUST not contain UUIDs as path tokens. API Root URLs MUST not contain the literals collections, manifest, objects, taxii, taxii2, or status as path tokens.

You can see some of the problems allowing any tokens in an API Root URL here: https://github.com/oasis-tcs/cti-taxii2/issues/47

varnerac commented 6 years ago

The revised text with relaxed requirements.

An API Root URL MUST not conflict with existing TAXII 2 endpoints.

I'm not really sure how to wordsmith the line above.

The path of an API Root URL MUST not be /taxii/ or /taxii2/. The last path segment of an API Root URL MUST not be collections, objects, or manifest. The second-to-last segment of an API Root URL MUST not be status, collections, or objects

Please suggest edits.

johnwunder commented 6 years ago

So I was thinking about this more and I guess I'm no longer sure we need to make this change in the spec. Here's my reasoning:

I agree that we should fix #47...it seems impossible to really validate that tools will do it correctly or that, if they do, the request will even make it all the way to the TAXII server in the first place. Thus, you no longer need to assure that you're going to get TAXII-appropriate error messages.
Separately and even if we don't fix 47, if a server wants to optimize its own API roots and make sure that they're completely de-conflicted (either taking the original hardline approach or something softer) then it can do that without it being in the specification. I don't think it should be the spec's job to enforce that type of optimization/quality...the spec should be tightly focused on interoperability. In other words, this seems like something implementations that want it can just do.

I lean towards not putting things in the spec unless absolutely necessary, and in this case it doesn't seem necessary. That said I don't think it's the end of the world if we do add it, and if so I would lean towards something simple like "An API Root URL MUST NOT conflict with other TAXII2 endpoints."

varnerac commented 6 years ago

TLDR; No standardized error codes without URL determinism

Not implementing this now may cause conformance issues later. If we end up with a standard list of error codes, different implementations of URL parsing will lead to non-standard responses. I wrote this up because of a disagreement on the error message in the interop test that intuitively seems to be "Collection not found", but could legitimately be "API Root not found".

I think we will eventually end up with standard error codes, that will add additional detail beyond the HTTP status code, uniformly across different client/server/persona implementations. To uniformly return error codes around HTTP requests, we have to uniformly parse the URLs across implementations.

There are a couple of parsing strategies. I looked at head-first and tail-first. For head-first, we attempt to match the beginning (or beginning of the path if your server supports relative URLs in API Roots) of the incoming URL against the list of the API Root URLs the server knows about, sorted descending by string length, so we hit /foo/bar/ before we hit /foo/, if both exist. The remaining portion of the path can then route to standard endpoints /, /collections/, /collections/<id>, etc.

In a tail-first implementation, I start at the end. E.g., if it ends in /collections/, I trim collections/ off the end, and the remainder is my API Root URL. This is nice because I need some portion of my API Root info from the moment I start processing a request. I definitely need max_content_length. I may need versions if I use that for content negotiation instead of the built-in HTTP content negotiation headers. Using tail-first maps well against a key-value store (dictionary, hash map, database, etc.) because after I trim the endpoint data and parameters off the end of the request, and I am returned a URL that should be an API Root URL. Then, I can leverage indexes in my key-value store to fetch the API Root metadata I must have or know for sure that it's a 404.

However, the tail-first parsing falls apart if we allow /foo/collections/ as an API Root URL. I'd parse it as the Collections endpoint of the API Root with URL /foo/.

To provide detailed error messages about:

invalid Collection id (not UUID)
Collection not found (404)
invalid Object id (not UUID)
Object not found (404)

We need URL determinism, which probably means tail-first parsing. This requires constraining the allowable values of an API Root URL.

varnerac commented 5 years ago

Closing this. People can have non-deterministic resource parsing in their server if they want. Not really sure it belongs in the spec because it's an edge case.

oasis-tcs / cti-taxii2

Constrain URLs in API Roots #60