ropensci / osmdata

R package for downloading OpenStreetMap data
https://docs.ropensci.org/osmdata
314 stars 45 forks source link

Error in overpass_query #335

Closed stalkerGH closed 9 months ago

stalkerGH commented 9 months ago

Hi everyone reading this.

I'm playing with rcityview package (https://github.com/koenderks/rcityviews). Because I wanted to download OSM data faster and not overload Kumi Systems Overpass API server, I was able to set my own Overpass API server with wiktorn Docker image (https://github.com/wiktorn/Overpass-API). Server is running, minute diffs are downloaded - everything works. I changed default Overpass API URL in osmdata usinge this set of commands:

library(osmdata) # load osmdata package get_overpass_url() # check current server new_url <- ("http://localhost:8888/api/interpreter") # set new URL set_overpass_url(new_url) # confirm new URL get_overpass_url() # check current server again

Then I execute this command:

p <- cityview(name = "Zakopane", zoom = 1) # download OSM city data

When I use default Kumi Systems server, data are downloaded as expected but - of course - not too fast. When I use my local server, progress bar grows but finally I got error message: Error in overpass_query(query = obj$overpass_call, quiet = quiet, encoding = encoding) : object 'doc' not found (last part is translated by me because I don't know how to change messages language in R).

Is it osmdata message? Which logs should I check?

Is someone able to help me?

jmaspons commented 9 months ago

Hi. You could try a query with osmdata against your server to take out rcityview from the equation.

library(osmdata)

get_overpass_url() # check current server
new_url <- ("http://localhost:8888/api/interpreter") # set new URL
set_overpass_url(new_url) # confirm new URL
get_overpass_url() # check current server again

x <- opq (bbox = c (-0.27, 51.47, -0.20, 51.50)) %>% # Chiswick Eyot in London, U.K.
    add_osm_feature (key = "name", value = "Thames", value_exact = FALSE) %>%
    osmdata_sf ()
x

Does it work?

stalkerGH commented 9 months ago

@jmaspons Thank you for debug hint. Yes, it works if I use remote (Kumi) server because my own server has data only for my country (Poland). Here is the response:

Object of class 'osmdata' with:
                 $bbox : 51.47,-0.27,51.5,-0.2
        $overpass_call : The call submitted to the overpass API
                 $meta : metadata including timestamp and version numbers
           $osm_points : 'sf' Simple Features Collection with 26277 points
            $osm_lines : 'sf' Simple Features Collection with 2862 linestrings
         $osm_polygons : 'sf' Simple Features Collection with 4 polygons
       $osm_multilines : 'sf' Simple Features Collection with 6 multilinestrings
    $osm_multipolygons : 'sf' Simple Features Collection with 2 multipolygons

Still using Kumi and asking the same as above but for city in Poland:

 x <- opq (bbox = c (16.45537, 50.80802, 16.52968, 50.87321)) %>% 
      add_osm_feature (key = "name", value = "Bystrzyca", value_exact = FALSE) %>%
      osmdata_sf ()

And got:

Object of class 'osmdata' with:
                 $bbox : 50.80802,16.45537,50.87321,16.52968
        $overpass_call : The call submitted to the overpass API
                 $meta : metadata including timestamp and version numbers
           $osm_points : 'sf' Simple Features Collection with 3196 points
            $osm_lines : 'sf' Simple Features Collection with 107 linestrings
         $osm_polygons : 'sf' Simple Features Collection with 0 polygons
       $osm_multilines : 'sf' Simple Features Collection with 2 multilinestrings
    $osm_multipolygons : 'sf' Simple Features Collection with 1 multipolygons

Now I change Overpass API URL from Kumi to local and ask again:

Object of class 'osmdata' with:
                 $bbox : 50.80802,16.45537,50.87321,16.52968
        $overpass_call : The call submitted to the overpass API
                 $meta : metadata including timestamp and version numbers
           $osm_points : 'sf' Simple Features Collection with 3196 points
            $osm_lines : 'sf' Simple Features Collection with 107 linestrings
         $osm_polygons : 'sf' Simple Features Collection with 0 polygons
       $osm_multilines : 'sf' Simple Features Collection with 2 multilinestrings
    $osm_multipolygons : 'sf' Simple Features Collection with 1 multipolygons

What should I check next?

jmaspons commented 9 months ago

It seems that your local server works as expected and that osmdata retrieve the same data as with the kumi server. I would trace the error at rcityview code

El dt., 12 de des. 2023, 12:45, stalkerGH @.***> va escriure:

@jmaspons https://github.com/jmaspons Thank you for debug hint. Yes, it works if I use remote (Kumi) server because my own server has data only for my country (Poland). Here is the response:

Object of class 'osmdata' with: $bbox : 51.47,-0.27,51.5,-0.2 $overpass_call : The call submitted to the overpass API $meta : metadata including timestamp and version numbers $osm_points : 'sf' Simple Features Collection with 26277 points $osm_lines : 'sf' Simple Features Collection with 2862 linestrings $osm_polygons : 'sf' Simple Features Collection with 4 polygons $osm_multilines : 'sf' Simple Features Collection with 6 multilinestrings $osm_multipolygons : 'sf' Simple Features Collection with 2 multipolygons

Still using and asking the same as above but for city in Poland:

x <- opq (bbox = c (16.45537, 50.80802, 16.52968, 50.87321)) %>% add_osm_feature (key = "name", value = "Bystrzyca", value_exact = FALSE) %>% osmdata_sf ()

And got:

Object of class 'osmdata' with: $bbox : 50.80802,16.45537,50.87321,16.52968 $overpass_call : The call submitted to the overpass API $meta : metadata including timestamp and version numbers $osm_points : 'sf' Simple Features Collection with 3196 points $osm_lines : 'sf' Simple Features Collection with 107 linestrings $osm_polygons : 'sf' Simple Features Collection with 0 polygons $osm_multilines : 'sf' Simple Features Collection with 2 multilinestrings $osm_multipolygons : 'sf' Simple Features Collection with 1 multipolygons

Now I change Overpass API URL from Kumi to local and ask again:

Object of class 'osmdata' with: $bbox : 50.80802,16.45537,50.87321,16.52968 $overpass_call : The call submitted to the overpass API $meta : metadata including timestamp and version numbers $osm_points : 'sf' Simple Features Collection with 3196 points $osm_lines : 'sf' Simple Features Collection with 107 linestrings $osm_polygons : 'sf' Simple Features Collection with 0 polygons $osm_multilines : 'sf' Simple Features Collection with 2 multilinestrings $osm_multipolygons : 'sf' Simple Features Collection with 1 multipolygons

What should I check next?

— Reply to this email directly, view it on GitHub https://github.com/ropensci/osmdata/issues/335#issuecomment-1851879250, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAZB5EJPA7UOSJP7MCNDFDYJA7VLAVCNFSM6AAAAABAQGYUKCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJRHA3TSMRVGA . You are receiving this because you were mentioned.Message ID: @.***>

stalkerGH commented 9 months ago

Thank you.

stalkerGH commented 9 months ago

I came back here. After consulting this issue with rcityviews author, he pointed me to overpass_query.R from osmdata. Please take a look into our conversation: https://github.com/koenderks/rcityviews/issues/17

I took little tests, querying remote and local Overpass API server with curl:

[out:xml][timeout:25];
area[name="Świdnica"];
way(area)[name="Parkowa"]->.my_street;
(
        nw[amenity](around.my_street:25);
        nwr[building](around.my_street:25);
        wr[highway](around.my_street:25);
);
(._;>;);
out;

curl commands for both servers:

curl -i -X POST --output r1.xml --data @q1 https://overpass-api.de/api/interpreter

and

curl -i -X POST --output r2.xml --data @q1 http://localhost:8888/api/interpreter

Both outputs - r1.xml and r2.xml - are equal except of HTTP headers (cut out unnecessary parts):

overpass-api.de

HTTP/1.1 200 OK
Server: Apache/2.4.56 (Debian)
Transfer-Encoding: chunked
Content-Type: application/osm3s+xml

localhost

HTTP/1.1 200 OK
Server: nginx/1.21.6
Transfer-Encoding: chunked
Content-Type: application/osm3s+xml

If I change [out:xml] in query to [out:json], I got results in Content-Type: application/json.

In overpass_query.R I see code for application/osm3s+xml or text/csv Content-Type, not for application/json.

But when my local Overpass API server is running, I see such messages when sending and receiving query ("asking" curl version is 7.81.0, "responding" curl (from Overpass API Docker container) is 7.74.0. ):

172.17.0.1 - - [13/Dec/2023:10:23:18 +0000] "POST /api/interpreter HTTP/1.1" 200 308 "-" "curl/7.81.0"
127.0.0.1 - - [13/Dec/2023:10:23:26 +0000] "GET /api/interpreter?data=[out:json];node(1);out; HTTP/1.1" 200 658 "-" "curl/7.74.0"

Maybe here is the problem, in output format of my local Overpass API server? In such case I see two solutions:

  1. change default output format of my Overpass API server running with nginx (where and how?)
  2. add something to the overpass_query.R code (what and where?)

Could you please help me?

stalkerGH commented 9 months ago

In the meantime I turned on quiet = FALSE in osmdata and got more (not much) feedback:

(cut several progress bars growing from zero)

Issuing query to Overpass API ...
Announced endpoint: none
Query complete!
converting OSM data to sf format
  \ [======================================>------------------------------------------]  48% | Time remaining:  8s
Issuing query to Overpass API ...
Announced endpoint: none
Query complete!

Command error 'cityview(name = "Lubin")': rror in overpass_query(query = obj$overpass_call, quiet = quiet, encoding = encoding) : object 'doc' not found

When the query goes from 0% to 48%, I can see that Overpass API works by visiting http://localhost:8888/api/status. Then it stops with messages as above.

My limited knowledge ends here. I can't understand why it works when using remote Overpass API server but doesn't work on my local server. Maybe this is because some special configuration of remote server? Maybe my configuration lacks something? Maybe I ran out of memory? I have 16 GB of RAM and everything works normal - Docker, RStudio, browser and plenty of other things.

Devs, could you help me in any way? I've already lost three days trying to understand what is going on :(

mpadge commented 9 months ago

It's very hard to know precisely, because so many factors depend on how you've configured your self-hosted server. One thing from the previous dumps that strikes me is that overpass and therefore the osmdata package expect all queries to return strictly application/osm3s+xml and definitely never application/json. If anything in your server permits the latter, i would expect that to go wrong.

More broadly, setting up your own server seems like a lot of work to go to. In case you're not aware, there are really useful tools for working directly with planet or local dumps (like from geofabrik.de), cutting or filtering them however you like, and importing directly into this package. Docs to get started are https://osmcode.org/osmium-tool/manual.html, and you just need at the end to convert to output format of .osm/.xml, and submit that as doc parameter to any osmdata query. That's generally a lot easier than wrestling with configuration of your own server.

stalkerGH commented 9 months ago

@mpadge Thank you for feedback. I don't know which informations I can deliver yet. As I wrote in first post, my testbed includes Overpass API run in Docker container. All tests which I have made indicate that installation was successful: Overpass API starts, wait for queries and meaningful replies are sent, diffs are downloaded and applied So this part of my setup gives me reason to assume that Overpass API itself is OK. To be clear - I'm not saying that osmdata has bug preventing queries to local Overpass API server to run.

My way of thinking: if rcityviews package I tested using osmdata works with the default Overpass server (e.g. Kumi), then both R packages are fine. If they work on my local Overpass server, but only up to a certain point, then there is probably some slight difference in both Overpass servers configuration. I can see my server's logs, but I can't see the remote server's logs, so I have no comparison. What I would like to check and set first is the option that tells the Overpass server what default format it should send data in (query response). Maybe that's the key.

Previously, I also asked if it is possible to inform osmdata that the incoming data is in JSON format and not OSM3S+XML. Maybe this can be easily implemented. But first, I would like to check the Overpass configuration. And for now, I'm just asking for help in this field.

mpadge commented 9 months ago

There is no such thing as OSM data in JSON format. As said, if there are any aspects of your config that permit that, those should be expected to cause problems. We can only really help with issues directly related to the functionality of this package, which will require a reproducible example using kumi server

stalkerGH commented 9 months ago

But fully reproducible example require rcityview package. Are you willing to install it...?

mpadge commented 9 months ago

Sure, that's no problem

stalkerGH commented 9 months ago

Because I can't come to terms with the package reprex which you pointed me to in other thread, I do that in "my way".

First, Kumi server:

library(osmdata)
library(rcityviews)
get_overpass_url()
new_url <- "https://overpass.kumi.systems/api/interpreter"
set_overpass_url(new_url)
get_overpass_url()
cityview(name = "Lubin")

Because I have set quiet = FALSE, I have plenty of messages and progress bars. I cut most of them and give only last of it:

Issuing query to Overpass API ... Announced endpoint: none Query complete! converting OSM data to sf format \ [=================================================================================] 100% | Time remaining: 0s

Command cityview(name = "Lubin") ends successfully.

Now my local server:

library(osmdata)
library(rcityviews)
get_overpass_url()
new_url <- "http://localhost:8888/api/interpreter"
set_overpass_url(new_url)
get_overpass_url()
cityview(name = "Lubin")

This ends with:

Announced endpoint: none Query complete! converting OSM data to sf format / [========================>--------------------------------------------------------] 31% | Time remaining: 9s Issuing query to Overpass API ... Announced endpoint: none Query complete! converting OSM data to sf format Błąd w poleceniu 'cityview(name = "Lubin")': Error in rcpp_osmdata_sf(paste0(doc)) : atrybut 'names' [249] musi mieć tę samą długość co wektor [69]

Last line says 'names' attribute must be the same length as the vector

Sometimes (more often) there is other error:

Błąd w poleceniu 'cityview(name = "Lubin")': error in overpass_query(query = obj$overpass_call, quiet = quiet, encoding = encoding) : object 'doc' not found

BTW: I realised that messages cited in one of my previous posts:

172.17.0.1 - - [13/Dec/2023:10:23:18 +0000] "POST /api/interpreter HTTP/1.1" 200 308 "-" "curl/7.81.0" 127.0.0.1 - - [13/Dec/2023:10:23:26 +0000] "GET /api/interpreter?data=[out:json];node(1);out; HTTP/1.1" 200 658 "-" "curl/7.74.0"

comes from healthcheck function of Overpass. This is explained in documentation of Docker container (https://github.com/wiktorn/Overpass-API#healthcheck-checking-that-instance-is-up-to-date) so I don't worry more for JSON in logs.

mpadge commented 9 months ago

Given the parallel comments in the rcityviews report, i really think this is just a problem with your server configuration. I don't have any further time at the moment to examine any further, so can only suggest what might help you.

I think the best way forward might be to do a trial extraction and use the osmdata_xml function to directly download the extracted data without converting to any format. That is a direct ".osm" file produced by overpass itself. Then use some kind of file diff tool to find any differences between those two files. My guess is there must be, in which case your job will be to tweak your server until they are identical. I guess also that the differences should be apparent in the file headers, and that should tell you more explicitly what the differences are.

Hope that helps!

stalkerGH commented 9 months ago

Probably self-resolved. Thank you for hint. Datasets (results of queries) for remote and local Overpass API servers look almost equal, accurate to the date. I couldn't find any special "magic" setting for Overpass server so I digged deeper in rcityviews code and probably found a bug. Thanks for help!

mpadge commented 9 months ago

Good to hear, thanks for closing issue here