Bad url-encoding behaviors

karlcz commented 7 years ago

The current service implementation has flaws with the handling of url-encoded characters:

Some reserved characters such as / and : are not allowed in namespace nor object names even when safely encoded by the client. This seems unfriendly as clients should be able to expect url-encoding to work consistently to protect any UTF-8 input they wish to embed in client-generated names.
Other reserved characters such as & are allowed but are conflated with their encoded forms. For example, the two object names test&name.txt and test%26name.txt are not considered distinct by hatrac when they really should be according to the relevant RFCs.
Some illegal characters are passed through Apache HTTPD and accepted by hatrac when they should be rejected, e.g. < is neither reserved nor non-reserved and should never appear in a valid URL.
Degenerate cases like %00 are rejected but with unhelpful error messages.

To improve this, it seems we should make several related changes:

[ ] Raise 400 responses for URLs with bad characters, i.e. those except
- Non-reserved characters A-Z, a-z, 0-9, -, _, ., and ~
- Reserved characters /, :, ;, ?, =, and & only when used as hatrac URL meta syntax
- Encoding units %hh
[ ] Stop using decoded forms of URL elements in the hatrac DB and backing store
- This will require a one-time upgrade procedure to re-encode data in existing deployments!
[ ] Reject %00 with a 400 error

And optionally:

[ ] Enforce that URL element, once decoded, is valid UTF-8.

This last test would be unnecessary for correct hatrac service function or safety, but might improve client safety in situations where clients use decoded URL elements in contexts that are not well guarded against unusual values.

karlcz commented 7 years ago

@hongsudt @bugacov @robes @ljpearlman @svoinea any comments or concerns?

karlcz commented 11 months ago

The flask refactor has addressed these concerns.

the names remain quoted in the DB
the names remain quoted in the backend unless configuring the S3 backend with unquote_object_keys

informatics-isi-edu / hatrac

Bad url-encoding behaviors #42