weavejester / compojure

A concise routing library for Ring/Clojure
Eclipse Public License 1.0
4.09k stars 256 forks source link

Path parameters containing . don't match routes #42

Closed TobiasBales closed 12 years ago

TobiasBales commented 13 years ago

If you define a route (e.g. (GET "/somepath/:user" _ dostuff)) it will not match /somepath/example.user and if I'm not mistaken dots are allowed in path parameters. I am not sure if there is a reason for this but I would really like to know if it was an oversight or if there is some reasoning behinder it.

wallrat commented 13 years ago

See https://github.com/weavejester/compojure/wiki/Routes-In-Detail

The default regexp when matching params is [^/.,;?]+ but you can provide a custom rule. So your example becomes

(GET ["/somepath/:user" :user #"[^,;?]+"] _ dostuff)

Don't know the the reasoning behind, but if I remember correctly Rails do the same thing.

nxg commented 13 years ago

This caught me, too.

According to RFC 3986 Sect. 3.3, a 'path' is composed of a sequence of 'segment' separated by slashes:

  segment       = *pchar
  pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"

  unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"
  pct-encoded = "%" HEXDIG HEXDIG
  sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
              / "*" / "+" / "," / ";" / "="

Thus "foo.x;a=b" is a perfectly legitimate path-segment.

The Routes-in-Detail page does, it turns out, document that the regexp will match 'any sub-path up to the next "/" or "."', but that's so unexpected and arbitrary that it seems like a bug in itself. The observation that Rails does the same thing is neither here nor there.

weavejester commented 13 years ago

I didn't seem to get an email for this issue the first time around, so apologies for being late to reply.

RFC 3986 only applies to the syntax of the URI, and has no direct bearing on the syntax of a Clout template.

However, it does have some indirect bearing, in that some people may expect a Clout keyword to match the same characters as a URL template.

On the other hand, the syntax Clout uses is copied from Rails and Sinatra. Changing this so it has subtly different syntax rules could also cause confusion.

Additionally, altering this syntax now could cause existing applications to break.

That said, I've now looked into the reason why Rails chose this particular syntax. It seems as if the "." was introduced as a separator only in Rails 1.2 in order to better support the formatted routes like "/:controller/:action.:format". I don't think Compojure actually needs that, because a more idiomatic approach for a Ring application would be to rewrite the request so that the extension is removed and the correct Content-Type header added to the request instead.

I'll think about it over the weekend, but given that I've had a few complaints about this, I'm leaning toward changing this behaviour. However, as it's a potentially breaking change, I think it'll need to wait for 0.7.0, rather than be released as a patch version. I'd like to add a few other features I've been thinking about to 0.7.0 before releasing it, though.

nxg commented 13 years ago

Thanks for looking at this (I didn't get a notification that your comment had appeared -- I wonder if something is amiss).

It's true that the template is in principle different from a URI. However, it is a template which is intended to match a URI in some sense, so this is perhaps a distinction without a difference. Matching Rails behaviour might make things slightly easier for people coming to this framework from that one, but is merely perplexing to people (like me) who've steered clear of Rails. I've come here from another framework which (as it happens) uses URI path-parameters freely, in a good pattern which I'd like to imitate here (and just by the way, if it's idiomatic within Rails to switch behaviour based on '.format' rather than Content-Type, that looks like an extra reason to avoid Rails...).

So it's not specifically dots that tripped me up here, but either colons or semicolons that I wanted to include in path segments (I can't just now remember).

I do however appreciate your commendable desire not to make a breaking change in a patch version, so I'll confine myself to suggesting that you allow the full range of RFC-permissible characters in URI paths by default.

weavejester commented 13 years ago

It's worth pointing out that any change the the default behaviour of matching routes would also be a breaking change.

However, I'll think I'll make keywords in Clout match against URI path segments, including the "." and ";" characters. This will be in place for the next release of Compojure, which I'll put out shortly after Ring 1.0.0 comes out (which is currently in beta).

nxg commented 13 years ago

Ah, sorry, that was supposed to end with something like "...by default, in the next appropriate release". Ahem.

siyu commented 12 years ago

Will this be fixed on 0.7.0? we're trying to replace our current java webservice (resteasy) with compojure, but this is a show stopper as our path parameter can contain "." and ",".

nxg commented 12 years ago

@siyu If you can't wait for the new version, a workaround is:

(def email-address-re #"[A-Za-z0-9.+-]+@[A-Za-z0-9-.]+")
(def tag-re #"[a-zA-Z0-9-]+")

(defroutes all-routes
    (GET ["/u/:user/:tag" :user email-address-re :tag tag-re]
      ...

Arguably, this isn't a workaround at all, but a way of both documenting and checking the content of the URLs you're handling. I suspect I'll continue to use this pattern even after the default is changed to be liberal enough to include "." and ",".

siyu commented 12 years ago

@nxg, thanks for solution, actually we're using #"[^/;?]+" and that works fine, and hoping that the issue will be fixed on the next version so we dont need to do the override.

siyu commented 12 years ago

It appears the "+" on the url path is getting blanked out, when submitting http://test/date/T+3, it's getting "T 3", I've checked the clout project and been unable to find any code that erase the + sign anywhere, anyone have any ideas how to do fix it? Thanks

joodie commented 12 years ago

"+" is one of the two standard URL encodings for the space character (the other encoding is "%20"). If you want to pass a + sign, encode it (use "%2B") before splicing it in the URL.

nxg commented 12 years ago

Sorry to come over all Language Lawyer, but... this is not a standard.

RFC 3986 (the current spec for URIs) does indeed say that spaces aren't permitted in URIs, and that they'd have to be percent-encoded. However '+' signs are fine (see the productions for path, and pchar in section 3.3, noting that '+' is one of the sub-delims characters). That is, a plus sign appearing in a URI path must remain untouched on parsing.

Put another way, if a URI http://test/date/T+3 is being parsed to produce a path "/date/T 3", then that's a bug, since this parsing violates the URI spec.

There is a convention that plus signs are turned into spaces, which almost certainly arose from a truly antique W3C URI spec (note that document was last modified Wed, 01 Feb 1995 20:02:22 GMT) which specifies that spaces are 'escaped' as '+'. Since this is a fairly widespread convention, it would be good to have that supported, but that should be at the user's choice, and not as a default.

Like I say, sorry to be all language-lawyer, but...

weavejester commented 12 years ago

Please don't use existing issues to discuss unrelated issues. If you want to discuss whether "+" should be treated as a space or not, open a new issue on Clout.

nxg commented 12 years ago

That's true. Perhaps @siyu can report this as a new issue.

siyu commented 12 years ago

thx for everyone's help, a new issue has been created in clout wiki on + sign issue: https://github.com/weavejester/clout/issues/5

weavejester commented 12 years ago

Fixed in 1.0.0-RC1

siyu commented 12 years ago

Awesome!! Thanks a lot.