Open jeroen opened 9 years ago
Most likely. When I get to my laptop I'll get something.
There's US states in geojson format https://github.com/glynnbird/usstatesgeojson
congressional district boundaries https://github.com/JeffreyBLewis/congressional-district-boundaries
Thanks. Do you also have an example data with lat/long coordinates in the US? They idea is to put these in the db and use a geojson to query records that appear within a certain geojson region.
us cities in geojson https://gist.github.com/sckott/97c28209169d64938714
First of all, thank you for the useful mongolite package.
I try to reproduce the code of a blog post. The author populates and queries a Mongo-database using GeoJson data. Assuming I have the following .geojson files (geo1.geojson, geo2.geojson, geo3.geojson, geo4.geojson) saved in a folder (GEO_DATA)
# geo1.geojson
{
"name" : "Squaw Valley",
"location" : {
"type" : "Point",
"coordinates" : [
-120.24,
39.21
]
}
}
# geo2.geojson
{
"name" : "Mammoth Lakes",
"location" : {
"type" : "Point",
"coordinates" : [
-118.9,
37.61
]
}
}
# geo3.geojson
{
"name" : "Aspen",
"location" : {
"type" : "Point",
"coordinates" : [
-106.82,
39.18
]
}
}
# geo4.geojson
{
"name" : "Whistler",
"location" : {
"type" : "Point",
"coordinates" : [
-122.95,
50.12
]
}
}
First, I read the files and I populate the Mongo-database using the mongolite package,
library(mongolite)
init_quer = mongo("GeoJson_query")
geo_json_files = list.files('/GEO_DATA', full.names = T)
for (i in 1:length(geo_json_files)) {
dat_geom = geojsonR::shiny_from_JSON(geo_json_files[i]) # read data using 'shiny_from_JSON'
# dat_geom = jsonlite::read_json(geo_json_files[i]) # OR using 'jsonlite'
init_quer$insert(dat_geom)
}
I don't face any problems when I do simple queries,
# simple query
subs = init_quer$find(query = '{"name" : "Whistler"}',
fields = '{ "location.coordinates" : true, "name" : true, "_id" : false}'
)
subs
coordinates name
1 -122.95, 50.12 Whistler
However, when I attempt to use the $geoIntersects or the $geoWithin operator I do get an empty data frame as an output,
# 'geoIntersects' from mongodb
subs_geointersect = init_quer$find(query =
'{"location": {
"$geoIntersects": {
"$geometry": {
"type": "Polygon",
"coordinates": [[
[-109, 41],
[-102, 41],
[-102, 37],
[-109, 37],
[-109, 41]
]]
}
}
}
}',
fields = '{ "location.coordinates" : true, "name" : true, "_id" : false}'
)
subs_geointersect
data frame with 0 columns and 0 rows
Am I doing something wrong?
I know it's a while since I asked, however I came to a solution, which is somehow involved (just for reference in case anyone is interested to use the MongoDB geospatial features).
This week I came across a blog post that uses MongoDB and mongolite for geospatial queries/analysis. The fact that the author used the MongoDB-Compass tool helped a lot to find out what exactly led to the previously mentioned empty data frame.
First I modified the data insertion code chunk in the following way:
library(mongolite)
init_quer = mongo(collection = "GeoJson_query", db = "GeoJson_db",
url = "mongodb://localhost", verbose = T)
geo_json_files = list.files('/GEO_DATA', full.names = T)
for (i in 1:length(geo_json_files)) {
dat_geom = jsonlite::read_json(geo_json_files[i], simplifyVector = T)
init_quer$insert(dat_geom)
}
Then I opened an ubuntu console and typed
sudo mongod --dbpath /var/lib/mongodb
to start the mongodb service (defining the path where the database is saved).
After installing the MongoDB-Compass I followed the instructions in the blog post to create a (geospatial) index for the GeoJson_query collection of the GeoJson_db database. However I got an error,
( Can't extract geo keys: { _id: ObjectId('5968cf7942b25619d34254c1'), name: [ "Squaw Valley" ], location: { type: [ "Point" ], coordinates: [ -120.24, 39.21 ] } } unknown GeoJSON type: { type: [ "Point" ], coordinates: [ -120.24, 39.21 ] }
That because the type field of each geospatial file was saved initially as an array ( ["Point"] ) rather than as a string ( I don't know if that has to do with the fact that the insert() method of the mongolite package accepts a data-frame, named list or a character vector as input).
Thus I had to open a new MongoDB session and modify the saved data,
mongo
use GeoJson_db # switch to the relevant database
and then I used the following query to modify the type field and save the updated data to a new collection GeoJson_query_updated ( probably there's a better mongodb-query to modify the data),
db.GeoJson_query.aggregate([{
"$project": {
"name": 1,
"location": {
"type": { "$arrayElemAt": [ "$location.type", 0 ] } ,
"coordinates": 1}}},
{$out : "GeoJson_query_updated"}
]
)
After that I reopened the MongoDB-Compass tool, I navigated to the GeoJson_query_updated collection and I created the (geospatial) index successfully.
Finally, by opening a new R-session I was able to get the correct output,
modified_db = mongo(collection = "GeoJson_query_updated", db = "GeoJson_db",
url = "mongodb://localhost", verbose = T)
subs_geointersect = modified_db$find(query =
'{"location": {
"$geoIntersects": {
"$geometry": {
"type": "Polygon",
"coordinates": [[
[-109, 41],
[-102, 41],
[-102, 37],
[-109, 37],
[-109, 41]
]]
}
}
}
}', fields = '{ "location.coordinates" : true, "name" : true, "_id" : false}'
)
subs_geointersect
name coordinates
1 Aspen -106.82, 39.18
I'd like to know if there's a simpler solution to this issue from within an R-session.
@mlampros - I may have missed the point of your issue, but I was able to create a geospatial index using
m$index((add = '{"geometry" : "2dsphere"}'))
## where m is my 'mongolite' connection object
@SymbolixAU, thanks for making me aware of the index operator. In my initial example, although I create the geospatial index as you pointed out, I continue to receive an empty data frame,
> init_quer$index((add = '{"geometry" : "2dsphere"}'))
v key._id key.geometry name ns 2dsphereIndexVersion
1 1 1 <NA> _id_ GeoJson_db.GeoJson_query NA
2 1 NA 2dsphere geometry_2dsphere GeoJson_db.GeoJson_query 3
> subs_geointersect = init_quer$find(query =
+
+ '{"location": {
+ "$geoIntersects": {
+ "$geometry": {
+ "type": "Polygon",
+ "coordinates": [[
+ [-109, 41],
+ [-102, 41],
+ [-102, 37],
+ [-109, 37],
+ [-109, 41]
+ ]]
+ }
+ }
+ }
+ }',
+
+ fields = '{ "location.coordinates" : true, "name" : true, "_id" : false}'
+ )
Imported 0 records. Simplifying into dataframe...
> subs_geointersect
data frame with 0 columns and 0 rows
However, you should know since my second post in this issue (July) I invested some time and I created the GeoMongo package, which performs geospatial queries using the reticulate package. I added also details of the package in a blog post.
@mlampros I think your issue might be that you've specified the index on the geometry
column, yet your first line in the query is querying against { "location" : {
. I think this should be { "geometry" : {
@SymbolixAU, would you mind sharing an R script solution which results in the same output as is the case for the "$geoIntersects" operator in the mentioned blog post.
@mlampros - I decided to write my own blog post as the example data was already available and I could read it directly into R. This shows how I used $geoIntersects
. Hopefully it's reproducible.
@SymbolixAU, thanks for sharing (blog post)
referencing my post on issue #109
MongoDB has some nice geospacial operators, for example
$geoWithin
queries for all points that lie within a certain area (polygon). This might make an interesting use case where we query data from a certain region or location. @sckott do you have some example data for this?