The current service implementation has flaws with the handling of url-encoded characters:
Some reserved characters such as / and : are not allowed in namespace nor object names even when safely encoded by the client. This seems unfriendly as clients should be able to expect url-encoding to work consistently to protect any UTF-8 input they wish to embed in client-generated names.
Other reserved characters such as & are allowed but are conflated with their encoded forms. For example, the two object names test&name.txt and test%26name.txt are not considered distinct by hatrac when they really should be according to the relevant RFCs.
Some illegal characters are passed through Apache HTTPD and accepted by hatrac when they should be rejected, e.g. < is neither reserved nor non-reserved and should never appear in a valid URL.
Degenerate cases like %00 are rejected but with unhelpful error messages.
To improve this, it seems we should make several related changes:
[ ] Raise 400 responses for URLs with bad characters, i.e. those except
Non-reserved characters A-Z, a-z, 0-9, -, _, ., and ~
Reserved characters /, :, ;, ?, =, and & only when used as hatrac URL meta syntax
Encoding units %hh
[ ] Stop using decoded forms of URL elements in the hatrac DB and backing store
This will require a one-time upgrade procedure to re-encode data in existing deployments!
[ ] Reject %00 with a 400 error
And optionally:
[ ] Enforce that URL element, once decoded, is valid UTF-8.
This last test would be unnecessary for correct hatrac service function or safety, but might improve client safety in situations where clients use decoded URL elements in contexts that are not well guarded against unusual values.
The current service implementation has flaws with the handling of url-encoded characters:
/
and:
are not allowed in namespace nor object names even when safely encoded by the client. This seems unfriendly as clients should be able to expect url-encoding to work consistently to protect any UTF-8 input they wish to embed in client-generated names.&
are allowed but are conflated with their encoded forms. For example, the two object namestest&name.txt
andtest%26name.txt
are not considered distinct by hatrac when they really should be according to the relevant RFCs.<
is neither reserved nor non-reserved and should never appear in a valid URL.%00
are rejected but with unhelpful error messages.To improve this, it seems we should make several related changes:
A
-Z
,a
-z
,0
-9
,-
,_
,.
, and~
/
,:
,;
,?
,=
, and&
only when used as hatrac URL meta syntax%hh
%00
with a 400 errorAnd optionally:
This last test would be unnecessary for correct hatrac service function or safety, but might improve client safety in situations where clients use decoded URL elements in contexts that are not well guarded against unusual values.