ropensci / elastic

R client for the Elasticsearch HTTP API
https://docs.ropensci.org/elastic
Other
245 stars 58 forks source link

Use jsonlite sf support #296

Closed cphaarmeyer closed 1 year ago

cphaarmeyer commented 1 year ago

Description

With this change 'sf' objects are converted to geojson when using docs_bulk_*(), which is possible since jsonlite 1.7.0. No changes for all other kinds of inputs as far as I can tell. This is a feature I need for my work when storing geodata in Elastic and to perform geo queries later. Hope other people find this useful too!

Example

With a mapping, you can use geo queries:

library(sf)

conn <- connect()
index <- "test"

sf <- read_sf(system.file("gpkg/nc.gpkg", package = "sf"))

index_create(conn, index)
# Elastic Version 8.4.1
mapping_create(conn, index, body = list(
  properties = list(geometry = list(type = "geo_shape"))
))

docs_bulk_index(conn, sf, index)

Sys.sleep(1)

Search(conn, index, body = list(
  query = list(
    bool = list(
      must = list(
        match_all = c()
      ),
      filter = list(
        geo_shape = list(
          geometry = list(
            relation = "intersects",
            shape = list(
              type = "point",
              coordinates = c(-79.40065, 35.55937)
            )
          )
        )
      )
    )
  )
))
sckott commented 1 year ago

Thanks @cphaarmeyer - Looks reasonable. Running that example I get no results. any ideas?

Is there any drawback/gotchas to adding sf = "features" to the toJSON call?

cphaarmeyer commented 1 year ago

When running the example again in one go, I needed to add Sys.sleep(1) to get results. Is that the problem @sckott?

cphaarmeyer commented 1 year ago

The change is basically this:

library(jsonlite)
library(sf)
#> Linking to GEOS 3.9.1, GDAL 3.4.3, PROJ 7.2.1; sf_use_s2() is TRUE

g <- st_sfc(st_point(1:2), st_point(3:4))
s <- st_sf(a = 3:4, g)

toJSON(s, collapse = FALSE, pretty = TRUE)
#> {
#>     "a": 3,
#>     "g": {
#>       "type": "Point",
#>       "coordinates": [1, 2]
#>     }
#>   } {
#>     "a": 4,
#>     "g": {
#>       "type": "Point",
#>       "coordinates": [3, 4]
#>     }
#>   }

toJSON(s, collapse = FALSE, sf = "feature", pretty = TRUE)
#> {
#>     "type": "Feature",
#>     "properties": {
#>       "a": 3
#>     },
#>     "geometry": {
#>       "type": "Point",
#>       "coordinates": [1, 2]
#>     }
#>   } {
#>     "type": "Feature",
#>     "properties": {
#>       "a": 4
#>     },
#>     "geometry": {
#>       "type": "Point",
#>       "coordinates": [3, 4]
#>     }
#>   }

Created on 2022-09-19 with reprex v2.0.2

The default does not work with the geo query above, but the latter does.

cphaarmeyer commented 1 year ago

Actually, it already works. The example above just was not consistent with the geometry names.

library(elastic)
library(sf)
#> Linking to GEOS 3.9.1, GDAL 3.4.3, PROJ 7.2.1; sf_use_s2() is TRUE

conn <- connect("10.30.80.169")
index <- "test"

sf <- read_sf(system.file("gpkg/nc.gpkg", package = "sf")) |>
  dplyr::rename(geometry = geom)

index_create(conn, index)
#> $acknowledged
#> [1] TRUE
#> 
#> $shards_acknowledged
#> [1] TRUE
#> 
#> $index
#> [1] "test"

conn$es_ver()
#> [1] 841

mapping_create(conn, index, body = list(
  properties = list(geometry = list(type = "geo_shape"))
))
#> $acknowledged
#> [1] TRUE

invisible(docs_bulk_index(conn, sf, index, quiet = TRUE))
#> Warning in sprintf(metadata_fmt, action, index, counter): one argument not used
#> by format '{"%s":{"_index":"%s"}}'

Sys.sleep(1)

str(Search(conn, index, body = list(
  query = list(
    bool = list(
      must = list(
        match_all = c()
      ),
      filter = list(
        geo_shape = list(
          geometry = list(
            relation = "intersects",
            shape = list(
              type = "point",
              coordinates = c(-79.40065, 35.55937)
            )
          )
        )
      )
    )
  )
)), 5)
#> List of 4
#>  $ took     : int 0
#>  $ timed_out: logi FALSE
#>  $ _shards  :List of 4
#>   ..$ total     : int 1
#>   ..$ successful: int 1
#>   ..$ skipped   : int 0
#>   ..$ failed    : int 0
#>  $ hits     :List of 3
#>   ..$ total    :List of 2
#>   .. ..$ value   : int 1
#>   .. ..$ relation: chr "eq"
#>   ..$ max_score: num 1
#>   ..$ hits     :List of 1
#>   .. ..$ :List of 4
#>   .. .. ..$ _index : chr "test"
#>   .. .. ..$ _id    : chr "WoTkhIMBC0-7Tc4vvmpR"
#>   .. .. ..$ _score : num 1
#>   .. .. ..$ _source:List of 15
#>   .. .. .. ..$ AREA     : num 0.18
#>   .. .. .. ..$ PERIMETER: num 2.14
#>   .. .. .. ..$ CNTY_    : int 1973
#>   .. .. .. ..$ CNTY_ID  : int 1973
#>   .. .. .. ..$ NAME     : chr "Chatham"
#>   .. .. .. ..$ FIPS     : chr "37037"
#>   .. .. .. ..$ FIPSNO   : int 37037
#>   .. .. .. ..$ CRESS_ID : int 19
#>   .. .. .. ..$ BIR74    : int 1646
#>   .. .. .. ..$ SID74    : int 2
#>   .. .. .. ..$ NWBIR74  : int 591
#>   .. .. .. ..$ BIR79    : int 2398
#>   .. .. .. ..$ SID79    : int 3
#>   .. .. .. ..$ NWBIR79  : int 687
#>   .. .. .. ..$ geometry :List of 2

Created on 2022-09-28 with reprex v2.0.2

sckott commented 1 year ago

Thanks. I think there was just a delay in the elasticsearch end the first time I tried it.

sckott commented 1 year ago

i'll ask someone to merge that has access

cphaarmeyer commented 1 year ago

I suggest an approach like https://github.com/ropensci/elastic/commit/c7fcb0e1207310e219442381d5d5a1e10a6076ec to not change existing behaviour. Sorry, I did not see your response before i pushed. 😅

sckott commented 1 year ago

Thanks, that change looks good. Merged now. Thanks for merging @maelle !

maelle commented 1 year ago

@cphaarmeyer would you be interested in maintaining this package? #292 :smile_cat: