Closed simonw closed 2 years ago
The problem with the dash encoding mechanism is that it turns out dashes are used in a LOT of existing Datasette instances - much of https://fivethirtyeight.datasettes.com/fivethirtyeight for example, and even https://datasette.io/ itself: https://datasette.io/dogsheep-index
It's pretty ugly to force all of those to change to their dash-encoded equivalent - and in fact it broke https://datasette.io/ in a subtle way:
I'm going to try using ~
instead and see if that works as well and causes less breakage to existing sites.
Asked about this on Twitter:
Anyone ever seen a proxy or other URL handling system do anything surprising with the tilde "~" character?
I'm considering it as an escaping character, in place of "-" as described in
Replies so far seem like it should be OK - Apache has supported this for home directories for a couple of decades now without any problems.
Relevant: https://datatracker.ietf.org/doc/html/rfc3986#section-2.1
reserved = gen-delims / sub-delims
gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
Notably ~
is not in either of those lists.
And in https://datatracker.ietf.org/doc/html/rfc3986#section-2.3 "Unreserved Characters":
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
Updated test:
@pytest.mark.parametrize(
"original,expected",
(
("abc", "abc"),
("/foo/bar", "~2Ffoo~2Fbar"),
("/-/bar", "~2F-~2Fbar"),
("-/db-/table.csv", "-~2Fdb-~2Ftable~2Ecsv"),
(r"%~-/", "~25~7E-~2F"),
("~25~7E~2D~2F", "~7E25~7E7E~7E2D~7E2F"),
),
)
def test_tilde_encoding(original, expected):
actual = utils.tilde_encode(original)
assert actual == expected
# And test round-trip
assert original == utils.tilde_decode(actual)
I've made a real mess of this. I'm going to revert Datasettemain
back to the last commit that passed the tests and try this again in a branch.
The state I had got to prior to that revert is in https://github.com/simonw/datasette/tree/issue-1657-wip
The thing that broke everything was this change:
I'm going to bring back the horrible get_format()
method for the moment, with its weird mutations of the args
object, then try and get rid of it again later.
Moving this to a PR.
Refs #1439