jeroen / mongolite

Fast and Simple MongoDB Client for R
https://jeroen.github.io/mongolite/
287 stars 65 forks source link

GeoSpatial use cases #7

Open jeroen opened 9 years ago

jeroen commented 9 years ago

MongoDB has some nice geospacial operators, for example $geoWithin queries for all points that lie within a certain area (polygon). This might make an interesting use case where we query data from a certain region or location. @sckott do you have some example data for this?

sckott commented 9 years ago

Most likely. When I get to my laptop I'll get something.

sckott commented 9 years ago

There's US states in geojson format https://github.com/glynnbird/usstatesgeojson

sckott commented 9 years ago

congressional district boundaries https://github.com/JeffreyBLewis/congressional-district-boundaries

jeroen commented 9 years ago

Thanks. Do you also have an example data with lat/long coordinates in the US? They idea is to put these in the db and use a geojson to query records that appear within a certain geojson region.

sckott commented 9 years ago

us cities in geojson https://gist.github.com/sckott/97c28209169d64938714

jeroen commented 7 years ago

Maybe should do a separate chapter on this in the book.

mlampros commented 7 years ago

First of all, thank you for the useful mongolite package.

I try to reproduce the code of a blog post. The author populates and queries a Mongo-database using GeoJson data. Assuming I have the following .geojson files (geo1.geojson, geo2.geojson, geo3.geojson, geo4.geojson) saved in a folder (GEO_DATA)


# geo1.geojson

{
    "name" : "Squaw Valley",
    "location" : {
        "type" : "Point",
        "coordinates" : [
            -120.24,
            39.21
        ]
    }
}

# geo2.geojson

{
    "name" : "Mammoth Lakes",
    "location" : {
        "type" : "Point",
        "coordinates" : [
            -118.9,
            37.61
        ]
    }
}

# geo3.geojson

{
    "name" : "Aspen",
    "location" : {
        "type" : "Point",
        "coordinates" : [
            -106.82,
            39.18
        ]
    }
}

# geo4.geojson

{
    "name" : "Whistler",
    "location" : {
        "type" : "Point",
        "coordinates" : [
            -122.95,
            50.12
        ]
    }
}


First, I read the files and I populate the Mongo-database using the mongolite package,



library(mongolite)

init_quer = mongo("GeoJson_query")

geo_json_files = list.files('/GEO_DATA', full.names = T)   

for (i in 1:length(geo_json_files)) {

  dat_geom = geojsonR::shiny_from_JSON(geo_json_files[i])           # read data using 'shiny_from_JSON' 

  # dat_geom = jsonlite::read_json(geo_json_files[i])               # OR using 'jsonlite'

  init_quer$insert(dat_geom)
}


I don't face any problems when I do simple queries,



# simple query

subs = init_quer$find(query = '{"name" : "Whistler"}',

                       fields = '{ "location.coordinates" : true, "name" : true, "_id" : false}'
)

subs

     coordinates     name
1 -122.95, 50.12 Whistler


However, when I attempt to use the $geoIntersects or the $geoWithin operator I do get an empty data frame as an output,



# 'geoIntersects' from mongodb

subs_geointersect = init_quer$find(query =

    '{"location": {
       "$geoIntersects": {
        "$geometry": {
          "type": "Polygon",
            "coordinates": [[
              [-109, 41],
              [-102, 41],
              [-102, 37],
              [-109, 37],
              [-109, 41]
              ]]
          }
        }
      }
    }',

    fields = '{ "location.coordinates" : true, "name" : true, "_id" : false}'
)

subs_geointersect

data frame with 0 columns and 0 rows


Am I doing something wrong?


mlampros commented 7 years ago

I know it's a while since I asked, however I came to a solution, which is somehow involved (just for reference in case anyone is interested to use the MongoDB geospatial features).

This week I came across a blog post that uses MongoDB and mongolite for geospatial queries/analysis. The fact that the author used the MongoDB-Compass tool helped a lot to find out what exactly led to the previously mentioned empty data frame.

First I modified the data insertion code chunk in the following way:


library(mongolite)

init_quer = mongo(collection = "GeoJson_query", db = "GeoJson_db", 

                  url = "mongodb://localhost", verbose = T)

geo_json_files = list.files('/GEO_DATA', full.names = T)   

for (i in 1:length(geo_json_files)) {

  dat_geom = jsonlite::read_json(geo_json_files[i], simplifyVector = T)

  init_quer$insert(dat_geom)
}

Then I opened an ubuntu console and typed


sudo mongod --dbpath /var/lib/mongodb

to start the mongodb service (defining the path where the database is saved).

After installing the MongoDB-Compass I followed the instructions in the blog post to create a (geospatial) index for the GeoJson_query collection of the GeoJson_db database. However I got an error,

( Can't extract geo keys: { _id: ObjectId('5968cf7942b25619d34254c1'), name: [ "Squaw Valley" ], location: { type: [ "Point" ], coordinates: [ -120.24, 39.21 ] } } unknown GeoJSON type: { type: [ "Point" ], coordinates: [ -120.24, 39.21 ] }

That because the type field of each geospatial file was saved initially as an array ( ["Point"] ) rather than as a string ( I don't know if that has to do with the fact that the insert() method of the mongolite package accepts a data-frame, named list or a character vector as input).

Thus I had to open a new MongoDB session and modify the saved data,


mongo

use GeoJson_db                 # switch to the relevant database

and then I used the following query to modify the type field and save the updated data to a new collection GeoJson_query_updated ( probably there's a better mongodb-query to modify the data),


db.GeoJson_query.aggregate([{ 

  "$project": { 

    "name": 1, 

    "location": {

      "type": { "$arrayElemAt": [ "$location.type", 0 ] } , 

      "coordinates": 1}}}, 

  {$out : "GeoJson_query_updated"}

  ]
)

After that I reopened the MongoDB-Compass tool, I navigated to the GeoJson_query_updated collection and I created the (geospatial) index successfully.

Finally, by opening a new R-session I was able to get the correct output,


modified_db = mongo(collection = "GeoJson_query_updated", db = "GeoJson_db", 
                        url = "mongodb://localhost", verbose = T)

subs_geointersect = modified_db$find(query =

                                     '{"location": {
       "$geoIntersects": {
        "$geometry": {
          "type": "Polygon",
            "coordinates": [[
              [-109, 41],
              [-102, 41],
              [-102, 37],
              [-109, 37],
              [-109, 41]
              ]]
          }
        }
      }
    }', fields = '{ "location.coordinates" : true, "name" : true, "_id" : false}'
)

subs_geointersect
   name    coordinates
1 Aspen -106.82, 39.18

I'd like to know if there's a simpler solution to this issue from within an R-session.

SymbolixAU commented 7 years ago

@mlampros - I may have missed the point of your issue, but I was able to create a geospatial index using

m$index((add = '{"geometry" : "2dsphere"}'))
## where m is my 'mongolite' connection object 
mlampros commented 7 years ago

@SymbolixAU, thanks for making me aware of the index operator. In my initial example, although I create the geospatial index as you pointed out, I continue to receive an empty data frame,


> init_quer$index((add = '{"geometry" : "2dsphere"}'))
  v key._id key.geometry              name                       ns 2dsphereIndexVersion
1 1       1         <NA>              _id_ GeoJson_db.GeoJson_query                   NA
2 1      NA     2dsphere geometry_2dsphere GeoJson_db.GeoJson_query                    3

> subs_geointersect = init_quer$find(query =
+                                      
+                                      '{"location": {
+        "$geoIntersects": {
+         "$geometry": {
+           "type": "Polygon",
+             "coordinates": [[
+               [-109, 41],
+               [-102, 41],
+               [-102, 37],
+               [-109, 37],
+               [-109, 41]
+               ]]
+           }
+         }
+       }
+     }',
+                                    
+     fields = '{ "location.coordinates" : true, "name" : true, "_id" : false}'
+ )
 Imported 0 records. Simplifying into dataframe...

> subs_geointersect
data frame with 0 columns and 0 rows

However, you should know since my second post in this issue (July) I invested some time and I created the GeoMongo package, which performs geospatial queries using the reticulate package. I added also details of the package in a blog post.

SymbolixAU commented 7 years ago

@mlampros I think your issue might be that you've specified the index on the geometry column, yet your first line in the query is querying against { "location" : {. I think this should be { "geometry" : {

mlampros commented 7 years ago

@SymbolixAU, would you mind sharing an R script solution which results in the same output as is the case for the "$geoIntersects" operator in the mentioned blog post.

SymbolixAU commented 7 years ago

@mlampros - I decided to write my own blog post as the example data was already available and I could read it directly into R. This shows how I used $geoIntersects. Hopefully it's reproducible.

mlampros commented 7 years ago

@SymbolixAU, thanks for sharing (blog post)

SymbolixAU commented 6 years ago

referencing my post on issue #109