ropensci / osmdata

R package for downloading OpenStreetMap data
https://docs.ropensci.org/osmdata
317 stars 45 forks source link

Error when using doc = large .osm file #75

Closed Robinlovelace closed 6 years ago

Robinlovelace commented 7 years ago

The esoteric example strikes again:

u = "http://download.geofabrik.de/europe/albania.osh.pbf"
download.file(u, "albania.osh.pbf")
msg = "osmconvert albania.osh.pbf >albania.osm")
system(msg)
osmdata_xml(input_file = "albania.osh.pbf", filename = "albania.osm")
q = add_feature(opq("Albania"), "highway")
albanian_roads = osmdata_sf(q, doc = "albania.osm")
## Error in strsplit(as.character(tstmp), " ")[[1]] : 
##   subscript out of bounds
mpadge commented 7 years ago

You're lucky - it just kills my computer. But i'm not sure that osmconvert produces exactly the same XML format as overpass delivers. Any divergence will be hugely problematic. Can you try the convert on a smaller file to see whether that works?

But this nevertheless opens a bigger issue that there are several ways to extract osm files, and i guess the package should ultimately expand to enable as many as possible, and not be restricted merely to overpass?

Robinlovelace commented 7 years ago

It's standard OSM xml format and displays fine in QGIS - not sure about smaller files - used Albania as a deliberately small country. Could try Wales or maybe event Lichtenstein!

Robinlovelace commented 7 years ago

Interesting: just failed in QGIS too.

image

Robinlovelace commented 7 years ago

But there clearly is data there, even if it won't import into an sqlite db: image

mpadge commented 7 years ago

crap-internet-at-useR - it means I should be doing more sociable things i'd suspect. I'll get back onto this asap

Robinlovelace commented 7 years ago

:+1: :beers:

Robinlovelace commented 6 years ago

What's the verdict?

mpadge commented 6 years ago

yeah, no verdict. Was just hoping to quietly close that one to clean things up a bit. The bigger issue is the need to process large files in memory-size chunks. I thought about renaming and re-opening the issue, but would prefer to leave it closed because that ability won't be able to be implemented in any sf form, and will have to wait for osmdata_sc() via silicate::SC() forms. There's a lot of work to do before then. cc @mdsumner

mdsumner commented 6 years ago

Off topic somewhat, while I catch up ... Any luck using sf or GDAL directly? I get all empty geoms, is this a dead end?

td <- tempdir()
u = "http://download.geofabrik.de/europe/albania.osh.pbf"
f <- file.path(td, "albania.osh.pbf")
read_pbf <- function(file) {
  layers <- sf::st_layers(file)$name
  setNames(purrr::map(layers, 
             ~sf::read_sf(file, .x)), layers)
}
x <- read_pbf(f)
purrr::map(x, nrow)  ## all zero
mdsumner commented 6 years ago

(and, what are the "osh" files, versus "osm"? - "shell"?)

fwiw, this works well - I'll use it to explore if maybe vapour/silicate can be used to filter through the padding - (and don't worry, I understand the direct PBF approach will be the best!)

td <- tempdir()
u = "http://download.geofabrik.de/europe/albania-latest.osm.pbf"
f <- file.path(td, basename(u))
download.file(u, f, mode = "wb")
read_pbf <- function(file) {
  layers <- sf::st_layers(file)$name
  setNames(purrr::map(layers, 
             ~sf::read_sf(file, .x)), layers)
}
x <- read_pbf(f)
purrr::map_int(x, nrow)
#          points            lines multilinestrings    multipolygons  other_relations 
#           56605            83145              123           112826              692 
purrr::map_int(x, ncol)
#          points            lines multilinestrings    multipolygons  other_relations 
 #             11               10                5               26                5 
Robinlovelace commented 6 years ago

Awesome work - looks promising!

mpadge commented 6 years ago

okay @Robinlovelace, I just managed to reproduce your original subscript out of bounds error. I'll let you know when I've solved that one...

mpadge commented 6 years ago

That commit fixes your original problem, plus another one that that fix revealed. The problem remains that these geofabrik data include full changelogs and osmdata does not yet handle those correctly. We're going to have to address that one down the line somewhere.

Robinlovelace commented 6 years ago

Yes. Not least because changelogs are useful for active transport research! Many thanks for the fix and glad the esoteric example yielded fruit, albeit of an unanticipated kind.

Robinlovelace commented 6 years ago

Many thanks for the fix btw - will take a look at how you did that.

mpadge commented 6 years ago

the original problem was a stupid one on my part. geofabrik files do not have a timestamp, which i simply did not anticipate.