mocnik-science / osm-python-tools

A library to access OpenStreetMap related services
GNU General Public License v3.0
441 stars 48 forks source link

Errors in cached results #42

Closed stefanct closed 3 years ago

stefanct commented 3 years ago

If a query results in an error (e.g., a timeout) even re-running the query with changed parameters (e.g., a different timeout value) immediately bails out. I have not checked if timeouts are the only problem but in general the cache should be invalidated in those cases.

(I am building the query strings manually; using 0.3.0 via pip)

mocnik-science commented 3 years ago

Dear @stefanct,

You are raising a good point. The overpassQueryBuilder does not add a timeout on intention, and when you build the query manually, a timeout should not be added manually. Instead of adding the timeout to your query directly, you can (and should) add it as a parameter to the query function:

overpass.query('...', timeout=25)

The following query, for instance, fails:

zandvoort = nominatim.query('Zandvoort')
query = overpassQueryBuilder(area=zandvoort, elementType='way', selector='"highway"', includeCenter=True)
roads = overpass.query(query, timeout=.1)

Rerunning the query then works as expected, i.e., the data is is downloaded, properly cached, and saved in the variable roads:

zandvoort = nominatim.query('Zandvoort')
query = overpassQueryBuilder(area=zandvoort, elementType='way', selector='"highway"', includeCenter=True)
roads = overpass.query(query, timeout=30)

Can you confirm this behaviour, or is this broken on your machine?

Thanks for checking.

stefanct commented 3 years ago

I am building the queries myself. However, they don't include the timeout implicitly but I set it with the parameter like you show it. See my code here (the request comes from a function on the top): https://github.com/stefanct/osm_refhistorymeta/blob/main/ref_contributors.py#L79

Your MWE do not work with the version I have. Even without the includeCenter it doesn't due to "Bad Request" errors from the server (no matter the timeout).

from OSMPythonTools.overpass import Overpass, overpassQueryBuilder
from OSMPythonTools.nominatim import Nominatim

nominatim = Nominatim()
overpass = Overpass()
zandvoort = nominatim.query('Zandvoort')
query = overpassQueryBuilder(area=zandvoort, elementType='way', selector='"highway"')
roads = overpass.query(query, timeout=100)

results in:

[overpass] downloading data: [timeout:100][out:json];area(<OSMPythonTools.nominatim.NominatimResult object at 0x7f5998d7cb00>)->.searchArea;(way["highway"](area.searchArea);); out body;
The requested data could not be downloaded. HTTP Error 400: Bad Request
Traceback (most recent call last):
  File "/home/stefanct/.local/lib/python3.7/site-packages/OSMPythonTools/internal/cacheObject.py", line 83, in __query
    response = urllib.request.urlopen(request)
  File "/usr/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.7/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.7/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request
Traceback (most recent call last):
  File "/home/stefanct/.local/lib/python3.7/site-packages/OSMPythonTools/internal/cacheObject.py", line 83, in __query
    response = urllib.request.urlopen(request)
  File "/usr/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.7/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.7/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "foo.py", line 8, in <module>
    roads = overpass.query(query, timeout=100)
  File "/home/stefanct/.local/lib/python3.7/site-packages/OSMPythonTools/internal/cacheObject.py", line 37, in query
    data = self.__query(queryString, params)
  File "/home/stefanct/.local/lib/python3.7/site-packages/OSMPythonTools/internal/cacheObject.py", line 87, in __query
    raise Exception(msg)
Exception: The requested data could not be downloaded. HTTP Error 400: Bad Request

Best would probably to update but... what's the easiest way to do that?

mocnik-science commented 3 years ago

This is the expected behaviour if you did not update. If you provide area=zandvoort.areaId(), it should work.

However, I recommend to update, making this change unnecessary:

pip3 install OSMPythonTools --upgrade
stefanct commented 3 years ago

OK, well, I see the same behavior as in my application: once there is a timeout error it does not try to re-download but instantly fails. Also, any timeouts <1.0 produce Bad Requests - no idea why that would work for you (it's not due to locales). I tried multiple things (1.0, 0.9, 0.1, .1 and even strings) to no avail.

mocnik-science commented 3 years ago

I have investigated this further. I will provide a fix soon. Stay tuned.

mocnik-science commented 3 years ago

Ok, I think I have fixed it now. Please update again to the newest version. Then, either run my example to test, or run your example, it does not matter.

Btw: My example was not correct. The parameter timeout seems to accept only integer values (at least in version of Overpass that runs on the official server). So it should be timeout=1 to provoke a timeout, at least sometimes.

I would be happy to read if it now works as expected.

mocnik-science commented 3 years ago

Additional remark: If you like the package, feel welcome to "star" it. ;) (If you don't, be honest ;)

stefanct commented 3 years ago

It's not just floats - it works fine with float values >=1.0. No idea what's going on there but I guess it's a server issue/feature. In any case, your fix regarding the errors in cached files works for me too, thanks!

mocnik-science commented 3 years ago

Perfect, this is nice to hear! Have fun using it.