ninenines / cowlib

Support library for manipulating Web protocols.
ISC License
279 stars 173 forks source link

cow_uri:urldecode/1 #98

Closed f3c0 closed 4 years ago

f3c0 commented 4 years ago

It seems there are characters (at least one) that urldecode can not handle (if not encoded). I didn't see any reason to do it different way than cow_qs does (let me know if you have reason). cow_qs:urldecode does almost the same, except it replaces "+" to " " (space)

essen commented 4 years ago

Different specifications. Doing this change would let Cowboy accept badly formed URIs, I don't think that's a good idea. Note that Cowboy doesn't use cow_qs:urldecode, it being there must be a historical remnant. cow_uri:urldecode will soon not be used anymore and uri_string will be used instead but it's too soon right now.

Instead of doing those changes though I think I would rather have a companion module to uri_string, one that would for example take a uri_string returned path and be able to return the list of segments with only potentially / being decoded (since uri_string does the rest) and similar useful functions. That module could contain a generic URL decode/encode as well that could have strict and non-strict modes. This could be included immediately since they would be new functions.

f3c0 commented 4 years ago

Thanks for the explanation. I'll check the uri_string When you say badly formed URI, do you mean characters in the URI that not allowed? First I didn't find why should not be allowed, but I think I can see now in the https://tools.ietf.org/html/rfc3986#section-3.3 for example a path segment can have unreserved / pct-encoded / sub-delims / ":" / "@" characters only

essen commented 4 years ago

Yes that's what I mean. uri_string will also error out on that one:

1> uri_string:parse(<<"/abc`/def">>).
{error,invalid_uri,"`"}