sensiblecodeio / data-services-helpers

Python module containing classes and functions that The Sensible Code Company's Data Services often used
https://sensiblecode.io/
BSD 2-Clause "Simplified" License
4 stars 4 forks source link

URLs with params not being checked correctly in _url_in_cache #19

Closed StevenMaude closed 8 years ago

StevenMaude commented 10 years ago

Needs investigating. Spotted by @scraperdragon and myself while working on GDS AAIB stuff. Switched to requests.get and it worked.

StevenMaude commented 10 years ago

This is something to do with _url_in_cache which is looking in the cache db, and related to using params.

>>> dshelpers.download_url(url, params=PARAMS)
URL in cache: False
>>> dshelpers.download_url(url, params=PARAMS)
URL in cache: False
>>> dshelpers.download_url(url)
URL in cache: False
>>> dshelpers.download_url(url)
URL in cache: True

This is quite annoying. It means that two seconds elapsed from the previous request, before a second request is made. This second request happens near instantly as it's already cached.

StevenMaude commented 10 years ago

@scraperdragon suggested that this is because _url_in_cache isn't checking the correct URL; it's checking the base URL, not the URL with query string values added in.

Thought calling _url_in_cache with a url derived from requests.Request(method, **kwargs).prepare().url might fix it, but doesn't seem to :( It seems to construct the URL correctly, but there's something else going on.

I was also seeing that when download_url('http://httpbin.org', params={'foo': 'bar'}) was run twice successively, it correctly stated it was in the cache. But when download_url('http://actual.interesting.site', params={...}) ran twice, it still wouldn't realise it was cached.

Horrible workaround: Just constructing query strings directly using requests.Request(method, **kwargs).prepare().url rather horribly and then calling download_url with this URL; not an ideal solution when this should just work.

djui commented 8 years ago

This should be resolved by https://github.com/scraperwiki/data-services-helpers/commit/fd65b551ddb1ae5ebdc434d12c7b75bb193166e7 and https://github.com/scraperwiki/data-services-helpers/commit/0b37fbcc3c5d0b3d200acce262d5b26b9d77c00f.