mementoweb / py-memento-client

A Memento Client Library in Python
Other
25 stars 6 forks source link

Use HTTPS #10

Closed framawiki closed 7 years ago

framawiki commented 7 years ago

Hello ! I'm a developer of pywikibot, a python library that allows bots to interact with websites like Wikipedia. In one of our shared script, we use your library. We discovered that memento-client does not give a https link for archive.org, and probably for other links. Is it possible to fix this ? I suppose that the problem come directly from your server, not from this library. Thanks !

Our bug tracker: https://phabricator.wikimedia.org/T167463

hariharshankar commented 7 years ago

Thanks for reporting this. Yes, this does seem to be coming from our server's API. We will have a look at what's going on and let you know.

framawiki commented 7 years ago

Thank you @hariharshankar !

hariharshankar commented 7 years ago

We have fixed our APIs to use the https endpoint for archive.org. The mementos are cached in a database to provide faster responses. So, you may see http URIs for these cached mementos and they should eventually be replaced with https URIs when the cache expires.

framawiki commented 7 years ago

Hello @hariharshankar, I still have http links today. Is it still a cache problem ?

Python 2.7.13 (default, Jan 19 2017, 14:48:08) 
[GCC 6.3.1 20170118]
Type "help", "copyright", "credits" or "license" for more information.
>>> import datetime
>>> import memento_client
>>> mc = memento_client.MementoClient()
>>> when = datetime.datetime.now()
>>> url = 'http://www.fallingrain.com/world/YI/2/Dunisice.html'
>>> memento_info = mc.get_memento_info(url, when)
>>> memento_info.get('mementos')
{'last': {'uri': ['http://web.archive.org/web/20110228040245/http://www.fallingrain.com:80/world/YI/2/Dunisice.html'], 'datetime': datetime.datetime(2011, 2, 28, 4, 2, 45)}, 'closest': {'datetime': datetime.datetime(2011, 2, 28, 4, 2, 45), 'uri': [u'http://web.archive.org/web/20110228040245/http://www.fallingrain.com:80/world/YI/2/Dunisice.html'], 'http_status_code': 404}, 'first': {'uri': ['http://web.archive.org/web/20071001061940/http://www.fallingrain.com/world/YI/2/Dunisice.html'], 'datetime': datetime.datetime(2007, 10, 1, 6, 19, 40)}}
>>> 
hariharshankar commented 7 years ago

Hi @framawiki, yes, there was an issue with the API's cache and we have fixed it now. Please give it another try.

The API supports cache-control headers, so you can also use Cache-Control: no-cache in case you see old HTTP links again. But please go easy on the cache-control header as it spawns a distributed search across multiple archives for every request.

framawiki commented 7 years ago

Yeah !! Thanks @hariharshankar !

>>> memento_info.get('mementos')
{'last': {'uri': ['https://web.archive.org/web/20110228040245/http://www.fallingrain.com:80/world/YI/2/Dunisice.html'], 'datetime': datetime.datetime(2011, 2, 28, 4, 2, 45)}, 'closest': {'datetime': datetime.datetime(2011, 2, 28, 4, 2, 45), 'uri': [u'https://web.archive.org/web/20110228040245/http://www.fallingrain.com:80/world/YI/2/Dunisice.html'], 'http_status_code': 404}, 'first': {'uri': ['https://web.archive.org/web/20071001061940/http://www.fallingrain.com/world/YI/2/Dunisice.html'], 'datetime': datetime.datetime(2007, 10, 1, 6, 19, 40)}}
kizule commented 7 years ago

Thank you very much!