Closed regisoc closed 4 years ago
To complete: here is the docker-compose.yml
to switch between all versions.
version: '3.7'
services:
elasticsearch:
container_name: elasticsearch
# choose one
# image: docker.elastic.co/elasticsearch/elasticsearch:5.6.16
image: docker.elastic.co/elasticsearch/elasticsearch:6.8.3
# image: docker.elastic.co/elasticsearch/elasticsearch:7.4.0
environment:
- discovery.type=single-node
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
ulimits:
memlock:
soft: -1
hard: -1
ports:
- 9200:9200
- 9300:9300
volumes:
- type: bind
source: ./elasticsearch.yml
target: /usr/share/elasticsearch/config/elasticsearch.yml
read_only: true
networks:
- esr
rstudio:
container_name: rstudio
image: roncar/rstudio-elastic:1.0.0
environment:
- PASSWORD=rstudiopwd
- USERID=1000
ports:
- 8787:8787
networks:
- esr
networks:
esr:
Thanks for opening the issue @regisoc
For the first issue about index_create
/index_exists
, I could not replicate the problem with a local version of Elasticsearch running on my mac, but I COULD replicate using docker with a similar compose file to yours. it looks like its coming from the underlying http client crul
. the head request in index_exists
isn't passing along the credentials. fixing that now ...
will address the 2nd one after the first one is fixed
For the 2nd problem with docs bulk, I was able to replicate the problem, both with local ES and in docker, and with the same versions that you had a problem with.
However, I was also able to replicate the problem using curl on the command line, completely outside of R. So it's probably not a problem with this package, but more likely an issue with the older versions of Elasticsearch. OR possibly a problem with the way we're constructing the nd-json files.
unfortunately, elasticsearch doesn't give us the failed lines of the nd-json that didn't get created. so it's hard to track down why this is happening
The auth problem with index_exists
has been fixed, install dev version remotes::install_github("ropensci/elastic")
, which should install the dev version of crul
with the fix.
The other problem with docs bulk: I think it's down to mappings. If you don't set a mapping for your index, ES tries to guess, and sometimes a later document has a value that conflicts with the intitial type that ES sets, and then it fails. I think the fix when this happens is to set the mapping, and I think you only have to for the problematic fields, but you could for all of them anyway, e,g ,.
library(elastic)
zz <- connect(user = "elastic", pwd = "changeme", errors = "complete")
body <- '{
"mappings": {
"mpg": {
"properties": {
"displ" : {"type" : "float"}
}
}
}
}'
index_create(zz, index='mpg', body=body)
out <- docs_bulk(zz, mpg, index = 'mpg')
out[[1]]$errors
Sys.sleep(1)
elastic::count(zz, "mpg")
index_delete(zz, index_name, verbose = FALSE)
The above works over and over again, so i think setting the index mapping is the fix for ES versions where you get intermittent failures with docs bulk
Ok, thanks. I will update and try it soon.
assuming this is fixed
Hi,
Before starting, you should know that I'm new to R and its env. Though, I prefer taking the risk to say some stupidies. Correct me if I'm wrong.
I am not sure if elastic is retrocompatible as you claim. I have built 2 examples to show my point.
The first case is direct: I don't have the same results between ES versions, following the (reference).
Here, ES 5.6.16 is wrong
```r > connexion <- connect(host = "elasticsearch", user = "elastic", pwd = "changeme") > elastic::index_get(connexion)$version$number [1] "5.6.16" > elastic::index_create(connexion, index = "wow") $acknowledged [1] TRUE $shards_acknowledged [1] TRUE $index [1] "wow" > elastic::index_exists(connexion, index = "wow") [1] FALSE # <------- ???? > elastic::index_delete(connexion, index = "wow") http://elasticsearch:9200/wow $acknowledged [1] TRUE ```Seems right in ES 6.8.3
```r > connexion <- connect(host = "elasticsearch", user = "elastic", pwd = "changeme") > elastic::index_get(connexion)$version$number [1] "6.8.3" > elastic::index_create(connexion, index = "wow") $acknowledged [1] TRUE $shards_acknowledged [1] TRUE $index [1] "wow" > elastic::index_exists(connexion, index = "wow") [1] TRUE # <------- ok > elastic::index_delete(connexion, index = "wow") http://elasticsearch:9200/wow $acknowledged [1] TRUE ```The last version (ES 7.4.0) seems also right
```r > connexion <- connect(host = "elasticsearch", user = "elastic", pwd = "changeme") > elastic::index_get(connexion)$version$number [1] "7.4.0" > elastic::index_create(connexion, index = "wow") $acknowledged [1] TRUE $shards_acknowledged [1] TRUE $index [1] "wow" > elastic::index_exists(connexion, index = "wow") [1] TRUE # <------- ok > elastic::index_delete(connexion, index = "wow") http://elasticsearch:9200/wow $acknowledged [1] TRUE ```I also double checked with
curl
.The second case is a bit more problematic.
Again, it seems to work with the last version (here, ES 7.4.0) but I have bigger issues with ES 5.6.16 and ES 6.8.3: all the data were not indexed using the
docs_bulk
method, meaning some data were lost.I tried to apply the following script to test that.
Application with the `mpg` dataset (included in `tidyverse` lib = static 234 lines, 11 cols)
```r library(tidyverse) library(elastic) # init connexion <- connect(host = "elasticsearch", user = "elastic", pwd = "changeme") print(elastic::index_get(connexion)$version$number) index_name = "mpg" data <- mpg max_test <- 20 res <- list() # nb of records/observations given to ES that we want to retrieve expected_obs <- dim(data)[1] # progress bar pb <- txtProgressBar(min = 0, max = max_test, initial = 0, style = 3) pbsum <- 0 # delete # elastic::index_delete(connexion, index_name, verbose = F, wait_for_completion = T) # push method exe <- function(){ # init elastic::index_delete(connexion, index_name, verbose = F, wait_for_completion = T) # push invisible(elastic::docs_bulk(connexion, data, index = index_name, quiet = T, wait_for_completion = T)) # **Near** real time Sys.sleep(1) # get count # elastic::Search(connexion, index = index_name, size = 1)$hits$total elastic::Search(connexion, index = index_name, size = 1)$hits$total == expected_obs } # test for(i in 1:max_test){ res[i] <- exe() pbsum <- pbsum + 1 setTxtProgressBar(pb, pbsum) } close(pb) ```Sending the same data (
mpg
) over and over again, we should get the same result (here, should end with 20TRUE
values insideres
).For ES 7.4.0, the result is as expected:
For ES 6.8.3, the result is not as expected:
For ES 5.6.16, the result is not as expected:
Can you reproduce? Did I miss something?
Limitation: I tested
index_exists()
anddocs_bulk()
, not others functions.Session Info
```r > devtools::session_info() ─ Session info ─────────────────────────────────────────────────────────────────────────────────────────── setting value version R version 3.6.1 (2019-07-05) os Debian GNU/Linux 9 (stretch) system x86_64, linux-gnu ui RStudio language (EN) collate en_US.UTF-8 ctype en_US.UTF-8 tz Etc/UTC date 2019-11-04 ─ Packages ─────────────────────────────────────────────────────────────────────────────────────────────── package * version date lib source assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.1) backports 1.1.4 2019-04-10 [1] CRAN (R 3.6.1) broom 0.5.2 2019-04-07 [1] CRAN (R 3.6.1) callr 3.3.2 2019-09-22 [1] CRAN (R 3.6.1) cellranger 1.1.0 2016-07-27 [1] CRAN (R 3.6.1) cli 1.1.0 2019-03-19 [1] CRAN (R 3.6.1) colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.6.1) crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.1) crul 0.8.4 2019-08-02 [1] CRAN (R 3.6.1) curl 4.2 2019-09-24 [1] CRAN (R 3.6.1) desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.1) devtools 2.2.1 2019-09-24 [1] CRAN (R 3.6.1) digest 0.6.21 2019-09-20 [1] CRAN (R 3.6.1) dplyr * 0.8.3 2019-07-04 [1] CRAN (R 3.6.1) elastic * 1.0.0 2019-04-11 [1] CRAN (R 3.6.1) ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.1) forcats * 0.4.0 2019-02-17 [1] CRAN (R 3.6.1) fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.1) generics 0.0.2 2018-11-29 [1] CRAN (R 3.6.1) ggplot2 * 3.2.1 2019-08-10 [1] CRAN (R 3.6.1) glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.1) gtable 0.3.0 2019-03-25 [1] CRAN (R 3.6.1) haven 2.1.1 2019-07-04 [1] CRAN (R 3.6.1) hms 0.5.1 2019-08-23 [1] CRAN (R 3.6.1) httpcode 0.2.0 2016-11-14 [1] CRAN (R 3.6.1) httr 1.4.1 2019-08-05 [1] CRAN (R 3.6.1) jsonlite 1.6 2018-12-07 [1] CRAN (R 3.6.1) lattice 0.20-38 2018-11-04 [2] CRAN (R 3.6.1) lazyeval 0.2.2 2019-03-15 [1] CRAN (R 3.6.1) lifecycle 0.1.0 2019-08-01 [1] CRAN (R 3.6.1) lubridate 1.7.4 2018-04-11 [1] CRAN (R 3.6.1) magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.1) memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.1) modelr 0.1.5 2019-08-08 [1] CRAN (R 3.6.1) munsell 0.5.0 2018-06-12 [1] CRAN (R 3.6.1) nlme 3.1-140 2019-05-12 [2] CRAN (R 3.6.1) pillar 1.4.2 2019-06-29 [1] CRAN (R 3.6.1) pkgbuild 1.0.5 2019-08-26 [1] CRAN (R 3.6.1) pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.1) pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.1) prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.6.1) processx 3.4.1 2019-07-18 [1] CRAN (R 3.6.1) ps 1.3.0 2018-12-21 [1] CRAN (R 3.6.1) purrr * 0.3.2 2019-03-15 [1] CRAN (R 3.6.1) R6 2.4.0 2019-02-14 [1] CRAN (R 3.6.1) Rcpp 1.0.2 2019-07-25 [1] CRAN (R 3.6.1) readr * 1.3.1 2018-12-21 [1] CRAN (R 3.6.1) readxl 1.3.1 2019-03-13 [1] CRAN (R 3.6.1) remotes 2.1.0 2019-06-24 [1] CRAN (R 3.6.1) rlang 0.4.0 2019-06-25 [1] CRAN (R 3.6.1) rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.1) rstudioapi 0.10 2019-03-19 [1] CRAN (R 3.6.1) rvest 0.3.4 2019-05-15 [1] CRAN (R 3.6.1) scales 1.0.0 2018-08-09 [1] CRAN (R 3.6.1) sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.1) stringi 1.4.3 2019-03-12 [1] CRAN (R 3.6.1) stringr * 1.4.0 2019-02-10 [1] CRAN (R 3.6.1) testthat 2.2.1 2019-07-25 [1] CRAN (R 3.6.1) tibble * 2.1.3 2019-06-06 [1] CRAN (R 3.6.1) tidyr * 1.0.0 2019-09-11 [1] CRAN (R 3.6.1) tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.6.1) tidyverse * 1.2.1 2017-11-14 [1] CRAN (R 3.6.1) triebeard 0.3.0 2016-08-04 [1] CRAN (R 3.6.1) urltools 1.7.3 2019-04-14 [1] CRAN (R 3.6.1) usethis 1.5.1 2019-07-04 [1] CRAN (R 3.6.1) vctrs 0.2.0 2019-07-05 [1] CRAN (R 3.6.1) withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.1) xml2 1.2.2 2019-08-09 [1] CRAN (R 3.6.1) zeallot 0.1.0 2018-01-28 [1] CRAN (R 3.6.1) [1] /usr/local/lib/R/site-library [2] /usr/local/lib/R/library ```