Closed regisoc closed 4 years ago
thanks @regisoc for the issue.
create
is not an exported or even internal function, i assume you mean index_create
?
Part of the discrepancy may be due to not waiting until the data are "available".
library(elastic)
library(tidyverse)
x <- connect()
s <- "storms"
For example, compare these two:
index_create(x, s)
invisible(elastic::docs_bulk_index(x, storms, s, s))
r <- elastic::Search(x, s)
r$hits$total == dim(storms)[1]
vs.
index_recreate(x, s)
invisible(elastic::docs_bulk_index(x, storms, s, s))
Sys.sleep(2)
rr <- elastic::Search(x, s)
rr$hits$total == dim(storms)[1]
But looking at logs I'm seeing that the indexing is running into errors, e.g., mapper [hu_diameter] cannot be changed from type [long] to [float]
- I imagine setting the mapping up front when you create the index will fix that.
FWIW, i'm not having this problem in Elasticsearch 7.5.1
Thanks for your quick comment.
Yes, it was index_create
, and yes, I am obtaining the same results in logs. I was hoping for ES deducing the right mapping by his own but it is having some difficulties doing so.
Pushing an explicit mapping resolves this. BTW, when automating stuffs, seems like we effectively need to Sys.sleep()
at least one sec by default between each steps (create index, give it mapping, push data).
Resolved.
It's possible there's some configuration options in your Elasticsearch instance for how soon data becomes available, I don't know.
Hi,
I am using ES version 6.8.3, via docker (docker.elastic.co/elasticsearch/elasticsearch:6.8.3)
I tried to push tidyverse datasets (
storms
here) to test several cases, and this happened:If the index is deleted and then reconstructed (with
index_delete -> index_create
orindex_recreate
), the number of records registered in ES (r$hits$total
) is not stable and I never get the full 10010 records registered.But, I think the
mapping update
is involved in some ways during thedocs_bulk
(logs hereafter), because when I do not recreate the index, there is no missing data.Can you reproduce?
Session Info
```r R version 3.6.1 (2019-07-05) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 9 (stretch) Matrix products: default BLAS/LAPACK: /usr/lib/libopenblasp-r0.2.19.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] jsonlite_1.6 elastic_1.0.0 data.table_1.12.6 forcats_0.4.0 stringr_1.4.0 [6] dplyr_0.8.3 purrr_0.3.3 readr_1.3.1 tidyr_1.0.0 tibble_2.1.3 [11] ggplot2_3.2.1 tidyverse_1.2.1 R6_2.4.0 loaded via a namespace (and not attached): [1] Rcpp_1.0.2 cellranger_1.1.0 pillar_1.4.2 compiler_3.6.1 tools_3.6.1 zeallot_0.1.0 [7] lubridate_1.7.4 lifecycle_0.1.0 nlme_3.1-140 gtable_0.3.0 lattice_0.20-38 pkgconfig_2.0.3 [13] rlang_0.4.1 cli_1.1.0 rstudioapi_0.10 curl_4.2 crul_0.8.4 haven_2.1.1 [19] withr_2.1.2 xml2_1.2.2 httr_1.4.1 generics_0.0.2 vctrs_0.2.0 hms_0.5.2 [25] triebeard_0.3.0 grid_3.6.1 tidyselect_0.2.5 httpcode_0.2.0 glue_1.3.1 readxl_1.3.1 [31] modelr_0.1.5 magrittr_1.5 urltools_1.7.3 backports_1.1.5 scales_1.0.0 rvest_0.3.4 [37] assertthat_0.2.1 colorspace_1.4-1 stringi_1.4.3 lazyeval_0.2.2 munsell_0.5.0 broom_0.5.2 [43] crayon_1.3.4 ```