Open jose1711 opened 8 years ago
@jose1711 Could you provide an example Way ID that causes this error to occur? I'd be interested in looking into this more if @metaodi thinks we should support this feature.
well... it's really just a looooong list of ways to download that triggers the error like wayid1, wayid2, ..waid999999
@austinhartzheim feel free to look into that. I think we need a way to stop at some point to avoid an endless loop. Maybe we can use this issue to discuss possible solutions. Do you already have an idea?
In general in think it's good to provide this kind of abstraction, so that a consumer of osmapi doesn't have to care about URL length limits. Something like a generator might come in handy here. I've seen something similar already in the OAI-PMH client implementation of pyoai. Let me know if you want to discuss this more in detail.
Excellent. I'm busy with final projects/exams at my university right now but I should have time in late December. If someone else is interested in working on this issue before then, feel free to take it.
I've been looking into this issue and it seems that the URI length limit is not defined in the API server software. Rather, I believe that the limit is imposed by the Apache web server itself. It seems that the length of the HTTP request line is the limiting factor. And Apache limits it to 8190 bytes by default.
This is the default value, which has not been set specifically on the servers. (If we wanted to pursue having the value set explicitly on the servers rather than relying on the default, I believe this Chef file would be the location to do it).
The following code shows that a request line of 8190 bytes gives the expected result whereas a request line of 8195 bytes causes the 414 error we are addressing in this issue:
len('GET /api/0.6/waysways=') + len(','.join([str(x) for x in range(1, 1854)])) + len(' HTTP/1.1\r\n') # 8190
len('GET /api/0.6/waysways=') + len(','.join([str(x) for x in range(1, 1855)])) + len(' HTTP/1.1\r\n') # 8195
api.WaysGet(range(1,1854)) # 404 error - expected
api.WaysGet(range(1,1855)) # 414 error - not expected
Here are some of the most likely solutions.
NodesGet()
/WaysGet()
/RelationsGet()
. A potential issue is support for 64-bit IDs which would limit us to ~380 IDs under the current length limits. Furthermore, we lose efficiency by creating extra API calls in situations where we may not need to.I'm personally leaning towards hardcoding a URI length limit constant, with or without trying to standardize the limit. I believe that the efficiency gains of this approach may be significant. Furthermore, I do not think it is likely that the length limit will be decreased in the future.
I'm interested in hearing your thoughts or alternate solutions.
@austinhartzheim thank you very much for this very thorough analysis of the problem at hand.
I have a few things to add:
osmapi
against their very own custom version of the API, running on their server or customized code, not running on ApacheAll these points lead me to the conclusion, that I'd prefer a limit with a good default value, that a user of osmapi
can override (e.g. in the constructor). If the limit is reached, another request is sent to the OSM API with the remaining items, the results are then put back together and returned to the consumer as "one", so that this whole process is transparent from a users perspective (i.e. they don't notice it).
I like the idea of retrying the request if we see a 414 error.
I think a good strategy would be to start at ~8000 bytes. Upon encountering a 414 error, we divide that number in half and retry. And if we encounter another 414 error, we divide it in half again to ~2000 bytes. After that, we raise an exception if the request is not successful.
The reason for starting at 8000 is that RFC 7230 recommends that servers support at least 8000 byte request lines.
The reason for ending at 2000 is because this is what browsers support and so almost every server (unless configured otherwise) is likely to support that.
Also, we can add a configuration option to override the default settings. I'm considering setting the number of retries to zero if that is the case (or perhaps we can make that configurable as well).
Also, you mentioned using a generator. Do you want the methods to return a generator instead or should we collect all the results and return them as a list?
I stumped upon this library for retrying, this might be handy for this use case. About the generators: I quite like the idea of returning generators when we return multiple items.
when attempting to download a big number of elements (say - using WaysGet method) ends in 'Request-URI Too Long'. it would be nice if osmapi is able to fight this by allowing to finish the request in chunks