ropensci / elastic

R client for the Elasticsearch HTTP API
https://docs.ropensci.org/elastic
Other
245 stars 58 forks source link

Handling dots and slahes #225

Closed bornakke closed 5 years ago

bornakke commented 5 years ago

Hi

I'm making a number of searches that includes dots and slashes in the query string - e.g. A.M.B.A or G/S. I can escape these using the Search functions 'q'-parameter e.g.:

json <- elastic::Search(index = "cvr-permanent", size = 3000, body = "A\\.M\\.B\\.A", df = "Vrvirksomhed.virksomhedMetadata.nyesteNavn.navn", default_operator = "AND"

When I however attempt to built more complicated request sending a json request through the body parameter I get the following error:

Error in check_inputs(body) : lexical error: invalid char in json text.
                                       A\.M\.B\.A
                     (right here) ------^

Query:

query_keywords_active <- '{
  "query" : {
    "bool" : {
     "filter" : {
        "bool" : {
         "should" : [
           {"term" : { "Vrvirksomhed.virksomhedMetadata.sammensatStatus" : "normal" }},
           {"term" : { "Vrvirksomhed.virksomhedMetadata.sammensatStatus" : "aktiv" }}
         ]
       }
     },
     "must" : {
        "query_string":{
          "fuzziness": "0",
          "default_field":"Vrvirksomhed.virksomhedMetadata.nyesteNavn.navn",
          "query": "A\\.M\\.B\\.A"
        }
     }
    }
  }
}'

Elastic version 0.8.4 Elasticsearch 6.2

Hope there are some smart people out there that can help with a solution or just a work around for this problem :)

Session Info ```r Session info ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- setting value version R version 3.4.4 (2018-03-15) system x86_64, linux-gnu ui RStudio (1.1.447) language (EN) collate en_GB.UTF-8 tz Europe/Copenhagen date 2018-07-11 Packages ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- package * version date source assertthat 0.2.0 2017-04-11 CRAN (R 3.4.0) base * 3.4.4 2018-03-16 local bindr 0.1.1 2018-03-13 CRAN (R 3.4.3) bindrcpp * 0.2.2 2018-03-29 CRAN (R 3.4.3) broom 0.4.5 2018-07-03 CRAN (R 3.4.4) cellranger 1.1.0 2016-07-27 CRAN (R 3.4.0) cli 1.0.0 2017-11-05 CRAN (R 3.4.2) colorspace 1.3-2 2016-12-14 cran (@1.3-2) compiler 3.4.4 2018-03-16 local crayon 1.3.4 2017-09-16 CRAN (R 3.4.1) curl 3.2 2018-03-28 CRAN (R 3.4.3) datasets * 3.4.4 2018-03-16 local devtools 1.13.6 2018-06-27 CRAN (R 3.4.4) digest 0.6.15 2018-01-28 CRAN (R 3.4.3) dplyr * 0.7.6 2018-06-29 CRAN (R 3.4.4) elastic * 0.8.4 2018-06-26 CRAN (R 3.4.4) forcats * 0.3.0 2018-02-19 CRAN (R 3.4.3) foreign 0.8-70 2018-04-23 CRAN (R 3.4.4) ggplot2 * 3.0.0 2018-07-03 CRAN (R 3.4.4) glue 1.2.0 2017-10-29 CRAN (R 3.4.2) googlesheets * 0.3.0 2018-06-29 CRAN (R 3.4.4) graphics * 3.4.4 2018-03-16 local grDevices * 3.4.4 2018-03-16 local grid 3.4.4 2018-03-16 local gtable 0.2.0 2016-02-26 CRAN (R 3.3.0) haven 1.1.2 2018-06-27 CRAN (R 3.4.4) hms 0.4.2 2018-03-10 CRAN (R 3.4.3) httr 1.3.1 2017-08-20 CRAN (R 3.4.0) jsonlite 1.5 2017-06-01 CRAN (R 3.4.0) lattice 0.20-35 2017-03-25 CRAN (R 3.3.3) lazyeval 0.2.1 2017-10-29 CRAN (R 3.4.2) lubridate 1.7.4 2018-04-11 CRAN (R 3.4.4) magrittr 1.5 2014-11-22 CRAN (R 3.3.0) memoise 1.1.0 2017-04-21 CRAN (R 3.4.0) methods * 3.4.4 2018-03-16 local mnormt 1.5-5 2016-10-15 CRAN (R 3.4.0) modelr 0.1.2 2018-05-11 CRAN (R 3.4.4) munsell 0.5.0 2018-06-12 CRAN (R 3.4.4) nlme 3.1-137 2018-04-07 CRAN (R 3.4.4) openxlsx * 4.1.0 2018-05-26 CRAN (R 3.4.4) parallel 3.4.4 2018-03-16 local pillar 1.2.3 2018-05-25 CRAN (R 3.4.4) pkgconfig 2.0.1 2017-03-21 CRAN (R 3.4.0) plyr 1.8.4 2016-06-08 CRAN (R 3.4.4) psych 1.8.4 2018-05-06 CRAN (R 3.4.4) purrr * 0.2.5 2018-05-29 CRAN (R 3.4.4) R6 2.2.2 2017-06-17 cran (@2.2.2) Rcpp 0.12.17 2018-05-18 cran (@0.12.17) readr * 1.1.1 2017-05-16 CRAN (R 3.4.0) readxl 1.1.0 2018-04-20 CRAN (R 3.4.4) reshape2 1.4.3 2017-12-11 cran (@1.4.3) rlang 0.2.1 2018-05-30 cran (@0.2.1) rstudioapi 0.7 2017-09-07 CRAN (R 3.4.2) rvest 0.3.2 2016-06-17 CRAN (R 3.3.2) scales 0.5.0 2017-08-24 CRAN (R 3.4.1) stats * 3.4.4 2018-03-16 local stringi 1.2.3 2018-06-12 CRAN (R 3.4.4) stringr * 1.3.1 2018-05-10 CRAN (R 3.4.4) tibble * 1.4.2 2018-01-22 CRAN (R 3.4.3) tidyr * 0.8.1 2018-05-18 CRAN (R 3.4.4) tidyselect 0.2.4 2018-02-26 CRAN (R 3.4.3) tidyverse * 1.2.1 2017-11-14 CRAN (R 3.4.2) tools 3.4.4 2018-03-16 local utils * 3.4.4 2018-03-16 local withr 2.1.2 2018-03-15 CRAN (R 3.4.3) xml2 * 1.2.0 2018-01-24 CRAN (R 3.4.3) yaml 2.1.19 2018-05-01 CRAN (R 3.4.4) zip 1.0.0 2017-04-25 CRAN (R 3.4.0)
sckott commented 5 years ago

thx for your question @bornakke

first thing I always say when you find errors is to set connect(errors="complete") to get the stacktrace from Elasticsearch

2nd, do you need to escape the periods for Elasticsearch database, or for R? If you use 4 slashes it seems to work (internally we use jsonlite to check that the JSON is valid, so you can check like):

jsonlite::fromJSON(x)

$query
$query$bool
$query$bool$filter
$query$bool$filter$bool
$query$bool$filter$bool$should
  Vrvirksomhed.virksomhedMetadata.sammensatStatus
1                                          normal
2                                           aktiv

$query$bool$must
$query$bool$must$query_string
$query$bool$must$query_string$fuzziness
[1] "0"

$query$bool$must$query_string$default_field
[1] "Vrvirksomhed.virksomhedMetadata.nyesteNavn.navn"

$query$bool$must$query_string$query
[1] "A\\.M\\.B\\.A"

and leaves you with \\. for each period. Is that what you want?

bornakke commented 5 years ago

That did the trick. I had tried one, two and even three backslashes, but I didn't have the fantasy to attempt with four ;)

Thank you so much for taking the time to help @sckott!

sckott commented 5 years ago

glad it worked!