Closed machawk1 closed 3 months ago
@ibnesayeed I removed the extraneous spacing.
Regarding testing URLs with spaces in them, the browser converts the space to %20 before submission and curl rejects the URL with a space, i.e.,
$ curl -i "http://localhost:1208/timemap/link/http%3A%2F%2Fexample.org/space test.html"
curl: (3) URL rejected: Malformed input to a URL function
$
Test with %20
in the path and query parameter using curl and see if nothing is dropped from the URL in the TimmeMap. The purpose of this test would be to ensure that when %20
is converted to plain white-space in the function in question, eventually it is escaped back to %20
for further flow dow the the process. If that is not the case, then we will have to explicitly replace white-spaces with %20
soon after unescaping the URL.
Moreover, I would also test +
in query parameter to make sure that is not broken either.
curl -i "http://localhost:1208/timemap/link/http%3A%2F%2Fexample.org/space%20test.html"
...results in https://www.webarchive.org.uk/wayback/archive/timemap/link/http://example.org/space%20test.html from the MemGator log.
curl -i "http://localhost:1208/timemap/link/http%3A%2F%2Fexample.org/test.html?sawood%20alam=ibnesayeed"
...produces https://www.webarchive.org.uk/wayback/archive/timemap/link/http://example.org/test.html?sawood alam=ibnesayeed" from the MemGator log.
curl -i "http://localhost:1208/timemap/link/http%3A%2F%2Fexample.org/test.html?sawood+alam=ibnesayeed"
...produces https://www.webarchive.org.uk/wayback/archive/timemap/link/http://example.org/test.html?sawood+alam=ibnesayeed from the MemGator log.
Are the results of any of these variations incorrect to you, @ibnesayeed?
Please try:
$ curl -i "http://localhost:1208/timemap/link/https://google.com/?q=united%20states"
To see if you get https://web.archive.org/web/20131212220548/https://google.com/?q=united%20states
in the produced TimMap. Looking in the logs might be inadequate. Also, logs contain spaces (as reported above) when %20
is used, so it is an undesired behavior.
curl -i "http://localhost:1208/timemap/link/https://google.com/?q=united%20states" returns a 404 from my local MemGator instance.
Closes #110
I also tested URIs that would get decoded as spaces, e.g.,
curl -i "http://localhost:1208/timemap/link/http%3A%2F%2Fexample.org%2F%20index.html"
The %20 remains in the URIR after being decoded.