oduwsdl / MemGator

A Memento Aggregator CLI and Server in Go
https://memgator.cs.odu.edu/api.html
MIT License
56 stars 11 forks source link

Handle escaped URI-Rs #146

Closed machawk1 closed 3 months ago

machawk1 commented 3 months ago

Closes #110

I also tested URIs that would get decoded as spaces, e.g.,

curl -i "http://localhost:1208/timemap/link/http%3A%2F%2Fexample.org%2F%20index.html"

The %20 remains in the URIR after being decoded.

machawk1 commented 3 months ago

@ibnesayeed I removed the extraneous spacing.

Regarding testing URLs with spaces in them, the browser converts the space to %20 before submission and curl rejects the URL with a space, i.e.,

$ curl -i "http://localhost:1208/timemap/link/http%3A%2F%2Fexample.org/space test.html"
curl: (3) URL rejected: Malformed input to a URL function
$
ibnesayeed commented 3 months ago

Test with %20 in the path and query parameter using curl and see if nothing is dropped from the URL in the TimmeMap. The purpose of this test would be to ensure that when %20 is converted to plain white-space in the function in question, eventually it is escaped back to %20 for further flow dow the the process. If that is not the case, then we will have to explicitly replace white-spaces with %20 soon after unescaping the URL.

Moreover, I would also test + in query parameter to make sure that is not broken either.

machawk1 commented 3 months ago

curl -i "http://localhost:1208/timemap/link/http%3A%2F%2Fexample.org/space%20test.html"

...results in https://www.webarchive.org.uk/wayback/archive/timemap/link/http://example.org/space%20test.html from the MemGator log.

curl -i "http://localhost:1208/timemap/link/http%3A%2F%2Fexample.org/test.html?sawood%20alam=ibnesayeed"

...produces https://www.webarchive.org.uk/wayback/archive/timemap/link/http://example.org/test.html?sawood alam=ibnesayeed" from the MemGator log.

curl -i "http://localhost:1208/timemap/link/http%3A%2F%2Fexample.org/test.html?sawood+alam=ibnesayeed"

...produces https://www.webarchive.org.uk/wayback/archive/timemap/link/http://example.org/test.html?sawood+alam=ibnesayeed from the MemGator log.

Are the results of any of these variations incorrect to you, @ibnesayeed?

ibnesayeed commented 3 months ago

Please try:

$ curl -i "http://localhost:1208/timemap/link/https://google.com/?q=united%20states"

To see if you get https://web.archive.org/web/20131212220548/https://google.com/?q=united%20states in the produced TimMap. Looking in the logs might be inadequate. Also, logs contain spaces (as reported above) when %20 is used, so it is an undesired behavior.

machawk1 commented 3 months ago

curl -i "http://localhost:1208/timemap/link/https://google.com/?q=united%20states" returns a 404 from my local MemGator instance.