lambdaisland / uri

A pure Clojure/ClojureScript URI library
Mozilla Public License 2.0
243 stars 21 forks source link

Expand normalize to handle special delimiters in path segment and (possibly) handle normalization of dot segments #24

Open FiV0 opened 2 years ago

FiV0 commented 2 years ago

From the naming I gathered that the intended use case for normalize was that uris that are semantically equivalent should get a canonical form. Some examples where this is currently not the case:

(require '[lambdaisland.uri.normalize :as norm]
         '[lambdaisland.uri :as uri])

Path segments:

(norm/normalize (uri/uri "https://foobar.org/foo/../bar"))
(norm/normalize (uri/uri "https://foobar.org/foo/./bar"))

Scheme based normalization (the following are urls are equivalent)

      http://example.com
      http://example.com/
      http://example.com:/
      http://example.com:80/

See here.

Is this sort of out of scope for this library? Should this be added?

Also, from my understanding of rfc3986, but this might be wrong, if I want to use a reserved character as part of the data I need to percent-encode it. For example, let's say I want to use / in my data, the request for an endpoint of /api/{data} could look for example something like "http://foobar.com/api/some%2Fdata". normalize would then confound the following two uris.

(norm/normalize (uri/uri "http://foobar.com/api/some%2Fdata"))
(norm/normalize (uri/uri "http://foobar.com/api/some/data"))

So my question is, shouldn't only unreserved characters be decoded as part of the normalization?

plexus commented 2 years ago

Normalize does not currently handle dot segments or default ports, it only deals with percent encoding at the moment. We do have the algorithm for dot segment resolution implemented as part or uri/join, I think it could make sense to add that to normalize.

(norm/normalize (uri/uri "http://foobar.com/api/some%2Fdata"))
(norm/normalize (uri/uri "http://foobar.com/api/some/data"))

This is indeed a bug. We handle special delimteres in the query segment, but not in the path segment. A PR for that is welcome as well.

FiV0 commented 2 years ago

I will try to have a look when I find the time.

alysbrooks commented 1 year ago

This issue includes a bug so it makes sense to keep open.