ropensci / elastic

R client for the Elasticsearch HTTP API
https://docs.ropensci.org/elastic
Other
245 stars 58 forks source link

Can't bulk update documents with boolean field type #239

Closed dpmccabe closed 5 years ago

dpmccabe commented 5 years ago
> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS  10.14.1

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] dplyr_0.7.6       elastic_0.8.4     rlist_0.4.6.1     bindrcpp_0.2.2   
 [5] doMC_1.3.5        iterators_1.0.10  foreach_1.4.4     glue_1.3.0       
 [9] stringr_1.3.1     hdf5r_1.0.1       tidyr_0.8.1       aws.s3_0.3.12    
[13] readr_1.1.1       RPostgreSQL_0.6-2 DBI_0.8           dbplyr_1.2.1     

I'm using elasticsearch 6.2.3.

This doesn't work:

library(elastic)
library(tibble)

d <- tibble(id = 1:3, x = c(TRUE, FALSE, TRUE), y = c("a", "b", "c"))

d_index <- '
{
  "mappings": {
    "d": {
      "properties": {
        "x": { "type": "boolean" },
        "y": { "type": "keyword" }
      }
    }
  }
}
'

if (index_exists("d")) index_delete("d")
index_create(index = "d", body = d_index)

res <- docs_bulk_update(
  d %>% select(-id),
  index = "d",
  type = "d",
  es_ids = F,
  doc_ids = d$id
)

print(res)

Output:

[[1]]
[[1]]$took
[1] 1

[[1]]$errors
[1] TRUE

[[1]]$items
[[1]]$items[[1]]
[[1]]$items[[1]]$update
[[1]]$items[[1]]$update$`_index`
[1] "d"

[[1]]$items[[1]]$update$`_type`
[1] "d"

[[1]]$items[[1]]$update$`_id`
[1] "1"

[[1]]$items[[1]]$update$status
[1] 400

[[1]]$items[[1]]$update$error
[[1]]$items[[1]]$update$error$type
[1] "mapper_parsing_exception"

[[1]]$items[[1]]$update$error$reason
[1] "failed to parse [x]"

[[1]]$items[[1]]$update$error$caused_by
[[1]]$items[[1]]$update$error$caused_by$type
[1] "illegal_argument_exception"

[[1]]$items[[1]]$update$error$caused_by$reason
[1] "Failed to parse value [ TRUE] as only [true] or [false] are allowed."

...

When initially uploading I can use docs_bulk instead of docs_bulk_update and this works. However, my use case requires updating.

What's really strange is that if I remove the keyword field so that my only field is the boolean one x, docs_bulk_update works again:

d2_index <- '
{
  "mappings": {
    "d2": {
      "properties": {
        "x": { "type": "boolean" }
      }
    }
  }
}
'

if (index_exists("d2")) index_delete("d2")
index_create(index = "d2", body = d2_index)

d2 <- tibble(id = 1:3, x = c(TRUE, FALSE, TRUE))

res <- docs_bulk_update(
  d2 %>% select(-id),
  index = "d2",
  type = "d2",
  es_ids = F,
  doc_ids = d2$id
)

print(res)

Output:

[[1]]
[[1]]$took
[1] 5

[[1]]$errors
[1] FALSE

[[1]]$items
[[1]]$items[[1]]
[[1]]$items[[1]]$update
[[1]]$items[[1]]$update$`_index`
[1] "d2"

[[1]]$items[[1]]$update$`_type`
[1] "d2"

[[1]]$items[[1]]$update$`_id`
[1] "1"

[[1]]$items[[1]]$update$`_version`
[1] 1

[[1]]$items[[1]]$update$result
[1] "created"

[[1]]$items[[1]]$update$`_shards`
[[1]]$items[[1]]$update$`_shards`$total
[1] 2

[[1]]$items[[1]]$update$`_shards`$successful
[1] 2

[[1]]$items[[1]]$update$`_shards`$failed
[1] 0

[[1]]$items[[1]]$update$`_seq_no`
[1] 0

[[1]]$items[[1]]$update$`_primary_term`
[1] 1

[[1]]$items[[1]]$update$status
[1] 201

...

The only workaround I can come up with is to store "true" and "false" as strings, but this would require updating a lot of application code.

sckott commented 5 years ago

thanks for the detailed report @dpmccabe - concur happens for me as well, looking at PR

dpmccabe commented 5 years ago

No problem. Note: I didn't add a unit test because I don't have ES installed locally at the moment.