r-spatial / discuss

a discussion repository: raise issues, or contribute!
54 stars 12 forks source link

MongoDB to sf #19

Closed SymbolixAU closed 5 years ago

SymbolixAU commented 6 years ago

I'm planning on submitting a proposal to RConsortium to create a MongoDB-to-sf library.

What's r-spatial's appetite for this, is it already being done, and is it a good idea?


Example

MongoDB can store geospatial data. I have a collection (saved locally) which contains over 72,000 LINESTRINGS (roads in Victoria, Australia)

Getting this data into R as an sf object takes over 30 seconds (on my machine)

library(mongolite)
library(sf)

m <- mongo(db = "roads", collection = "roads")

system.time({
  m_roads <- m$find()
})
##   user  system elapsed 
## 29.288   0.845  32.112 

system.time({
  geom <- m_roads$geometry
  geom$id <- 1:nrow(geom)

  sfc <- lapply(geom$coordinates, sf::st_linestring)
  sfc <- sf::st_sfc(sfc)
})

##   user  system elapsed 
##  4.553   0.036   5.027 

str(sfc)
## sfc_LINESTRING of length 72943; first list element:  XY [1:2, 1:2] 145 145 -38 -38

I have created a prototype package which returns the same sfc object in approx 1 second

library(mongoGeo)

system.time({
  con <- mg_connect(db = "roads", collection = "roads")
  sfc <- mg_find_sfc(con)
})

##  user  system elapsed 
## 1.215   0.119   1.546

str(sfc)
## sfc_LINESTRING of length 72943; first list element:  XY [1:2, 1:2] 145 145 -38 -38
edzer commented 6 years ago

In principle yes, but it will be easier to give comments if you share your draft proposal; you'll find draft proposals of (successful) proposals for sf and stars in their respective repositories.

SymbolixAU commented 6 years ago

I'll share it in a couple of days when it's written; I've only just had the idea :)

tim-salabim commented 6 years ago

It seems a little too specific (MongoDB only?) for RConsortium. Would it be extensible to other dbs as well?

SymbolixAU commented 6 years ago

The logic that does the conversion would be extensible because it's all about parsing GeoJSON.

However, communication with NoSQL databases are reliant on the specific drivers for those databases and the underlying data representation. e.g., MongoDB stores its data as BSON, and so provides the C/C++ API to communicate with it.

DynamoDB would have a different set of drivers, and FireGeo (or whatever they're going to name it) will have a different set again.

edzer commented 6 years ago

Doesn't MongoDB provide a WKB interface?

SymbolixAU commented 6 years ago

I've not heard of such an interface.

I had also forgotten about the GeoMongo package. Maybe it would be worth hooking into that in some way instead?

@mlampros - Do you have any thoughts on converting the output from GeoMongo directly to sf objects?

mlampros commented 6 years ago

@SymbolixAU would you mind sharing a minimal reproducible example so that I'm able to compare my approach (GeoMongo) with yours (mongoGeo).

SymbolixAU commented 6 years ago

I've decided not to submit a proposal for this particular project, but, I think it's worth pursuing.

I'll share the mongo work soon, just fixing a few things first.

SymbolixAU commented 6 years ago

I took tim's "too specific" comment on board and thought about ways to make this more generic.

Ultimately it comes down to parsing GeoJSON, so I wanted to see if I could write a "fast" parser to add to what can already been done in sf e.g. here and in geojsonio.

So i've been playing with geojsonsf. The Readme gives some examples and benchmarks.

I'm now going to see if I can make this even more generic, or at least handle the BSON objects returned by MongoDB.

tim-salabim commented 5 years ago

Closing, feel free to re-open if necessary