Closed StevenMaude closed 8 years ago
This is something to do with _url_in_cache which is looking in the cache db, and related to using params.
>>> dshelpers.download_url(url, params=PARAMS)
URL in cache: False
>>> dshelpers.download_url(url, params=PARAMS)
URL in cache: False
>>> dshelpers.download_url(url)
URL in cache: False
>>> dshelpers.download_url(url)
URL in cache: True
This is quite annoying. It means that two seconds elapsed from the previous request, before a second request is made. This second request happens near instantly as it's already cached.
@scraperdragon suggested that this is because _url_in_cache
isn't checking the correct URL; it's checking the base URL, not the URL with query string values added in.
Thought calling _url_in_cache
with a url derived from requests.Request(method, **kwargs).prepare().url
might fix it, but doesn't seem to :( It seems to construct the URL correctly, but there's something else going on.
I was also seeing that when download_url('http://httpbin.org', params={'foo': 'bar'})
was run twice successively, it correctly stated it was in the cache. But when download_url('http://actual.interesting.site', params={...})
ran twice, it still wouldn't realise it was cached.
Horrible workaround: Just constructing query strings directly using requests.Request(method, **kwargs).prepare().url
rather horribly and then calling download_url
with this URL; not an ideal solution when this should just work.
Needs investigating. Spotted by @scraperdragon and myself while working on GDS AAIB stuff. Switched to
requests.get
and it worked.