mementoweb / py-memento-client

A Memento Client Library in Python
Other
25 stars 6 forks source link

`Expected URI`s in testdata don't match with the real one #14

Closed refeed closed 7 years ago

refeed commented 7 years ago

When I run the tests in Travis-CI and in my computer. Some of the tests are failing because of the expected result is not same as the real one.

One of them is https://travis-ci.org/refeed/py-memento-client/jobs/253494646#L391

[gw23] linux -- Python 3.5.3 /home/travis/virtualenv/python3.5.3/bin/python
input_uri_r = 'http://www.cnn.com'
input_datetime = datetime.datetime(1998, 7, 28, 17, 18, 49)
input_timegate = 'http://web.archive.org/web/'
expected_uri_m = 'http://web.archive.org/web/20000815052826/http://www.cnn.com/'
    @pytest.mark.parametrize("input_uri_r,input_datetime,input_timegate,expected_uri_m", specified_timegate_testdata)
    def test_get_memento_uri_specified_timegate(input_uri_r, input_datetime, input_timegate, expected_uri_m):

        mc = MementoClient(timegate_uri=input_timegate, check_native_timegate=False)

        actual_uri_m = mc.get_memento_info(input_uri_r, input_datetime).get("mementos").get("closest").get("uri")[0]

>       assert expected_uri_m == actual_uri_m
E       AssertionError: assert 'http://web.a.../www.cnn.com/' == 'http://web.ar...w.cnn.com:80/'
E         Skipping 50 identical leading characters in diff, use -v to show
E         - ww.cnn.com/
E         + ww.cnn.com:80/
E         ?           +++
test/test_memento_client.py:93: AssertionError

The AssertionError above is happen because the real one which has :80 on it (port) is not same as the expected one which doesn't have the port. The port above is added from the timegate server itself which is web.archive.org.

This is the result when I do a curl for url http://web.archive.org/web/20000815052826/http://www.cnn.com/ :

$ curl --head "http://web.archive.org/web/20000815052826/http://www.cnn.com/"
HTTP/1.1 200 OK
Server: Tengine/2.1.0
Date: Fri, 14 Jul 2017 12:06:49 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 79991
Connection: keep-alive
X-Archive-Orig-date: Tue, 15 Aug 2000 05:28:24 GMT
X-Archive-Orig-set-cookie: CNNid=cf19470d-11709-966317304-1; expires=Wednesday, 30-Dec-2037 16:00:00 GMT; path=/; domain=.cnn.com
X-Archive-Orig-last-modified: Tue, 15 Aug 2000 05:28:24 GMT
X-Archive-Orig-server: Netscape-Enterprise/2.01
X-Archive-Guessed-Charset: utf-8
Memento-Datetime: Tue, 15 Aug 2000 05:28:26 GMT
Link: <http://www.cnn.com:80/>; rel="original", <http://web.archive.org/web/timemap/link/http://www.cnn.com:80/>; rel="timemap"; type="application/link-format", <http://web.archive.org/web/http://www.cnn.com:80/>; rel="timegate", <http://web.archive.org/web/20000620180259/http://cnn.com:80/>; rel="first memento"; datetime="Tue, 20 Jun 2000 18:02:59 GMT", <http://web.archive.org/web/20000804165812/http://cnn.com:80/>; rel="prev memento"; datetime="Fri, 04 Aug 2000 16:58:12 GMT", <http://web.archive.org/web/20000815052826/http://www.cnn.com:80/>; rel="memento"; datetime="Tue, 15 Aug 2000 05:28:26 GMT", <http://web.archive.org/web/20000817204102/http://www2.cnn.com:80/>; rel="next memento"; datetime="Thu, 17 Aug 2000 20:41:02 GMT", <http://web.archive.org/web/20000620180259/http://cnn.com:80/>; rel="last memento"; datetime="Tue, 20 Jun 2000 18:02:59 GMT"
X-App-Server: wwwb-app19
X-ts: ----
X-Archive-Playback: 0
X-location: All
X-Page-Cache: MISS

As we can see above, web.archive.org is adding the port number on the last of the url which is different with the expected one.

So I think web.archive.org has been updated its headers content format. There are still many AssertionErrors, which are still similar with this case, like http://s are changed to https://, etc.

hariharshankar commented 7 years ago

I am working on writing proper unit tests and getting rid of all the tests that depend on any external network, like the archives. Once that is implemented, hopefully, we won't have the need for maintaining these URLs anymore.

refeed commented 7 years ago

Ok, that's good

hariharshankar commented 7 years ago

b133b44 & release v0.6.0 fixes this issue.