ropensci / elastic

R client for the Elasticsearch HTTP API
https://docs.ropensci.org/elastic
Other
245 stars 58 forks source link

docs_bulk results in Error in curl::curl_fetch_memory(x$url$url, handle = x$url$handle) : Send failure: Broken pipe #278

Closed Jensxy closed 3 years ago

Jensxy commented 3 years ago

Hello together,

I updated the elastic package from 0.8.7 to 1.1.0 and now I get an error even though I didn't change anything in my code. The affected function is docs_bulk. I read something about the size of the data I try to upload, but the size doesn't matter, as I already tried to upload a data frame with only one row. If I downgrade the R package, everything works fine. However, I want to use the new R package. What can I do?

Session Info R version 3.6.3 (2020-02-29) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18363) Matrix products: default locale: [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252 LC_NUMERIC=C [5] LC_TIME=German_Germany.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] elastic_1.1.0 Elasticsearch > curl -XGET 'http://localhost:9200' > { > "name" : "XYZ", > "cluster_name" : "XYZ", > "cluster_uuid" : "0wNT8jgiTryABUoz8hLAfw", > "version" : { > "number" : "7.2.0", > "build_flavor" : "default", > "build_type" : "rpm", > "build_hash" : "508c38a", > "build_date" : "2019-06-20T15:54:18.811730Z", > "build_snapshot" : false, > "lucene_version" : "8.0.0", > "minimum_wire_compatibility_version" : "6.8.0", > "minimum_index_compatibility_version" : "6.0.0-beta1" > }, > "tagline" : "You Know, for Search" R Code ``` es_conn <- elastic::connect(host = es_host, port = es_port, errors = "complete") elastic::docs_bulk(conn = es_conn, x = df_com, index = es_idx) ``` This results in the following error: > Error in curl::curl_fetch_memory(x$url$url, handle = x$url$handle) : > Send failure: Broken pipe Elastic-Log shows the following error: > Rejecting mapping update to [indexname] as the final mapping would have more than 1 type: [_doc, indexname]
sckott commented 3 years ago

It probably has more to do with the Elasticsearch version as more recent versions have been moving towards allowing only 1 type per index. Are you sure you haven't updated your Elasticsearch version too since it worked last?

Jensxy commented 3 years ago

Thank you for your fast reply @sckott. I use Elasticsearch version 7.2.0. When I use elastic package 0.8.7 everything works fine

sckott commented 3 years ago

Please share a reproducible example with a file or data.frame so I can reproduce the problem

Jensxy commented 3 years ago
 body <- paste0('
                 {
         "mappings": {
                 "properties": {
                 "DOC_ID"          : { "type": "keyword", "index": "true" },
                 "CONTENT"         : { "type": "text", "analyzer": "standard",
                 "fields": {
                 "RAW"    : { "type": "text", "analyzer": "whitespace" } ,
                 "ENGLISH": { "type": "text", "analyzer": "english" } ,
                 "GERMAN" : { "type": "text", "analyzer": "german"  } ,
                 "FRENCH" : { "type": "text", "analyzer": "french"  } ,
                 "LENGTH" : { "type": "token_count", "store": "true", "analyzer": "standard" }
                 }
                 }
                 }
                 }
        }
                 ')

  elastic::index_create(conn = my_conn, index = "test_index", body = body)

This is the statement I use to create the index.

And I use the following two lines to load the data

df <- data.frame(DOC_ID = "test123", CONTENT = "This is a test comment")

docs_bulk_create(conn = my_conn, x = df, index = "test_index", doc_ids = df$DOC_ID, es_ids = FALSE)

I should mention that I am not able to reproduce the error in another server network. However, I found similar problems on stackoverflow

sckott commented 3 years ago

Thank you for that further detail.

I can't replicate the error. Create the error again but immmediately after the function call that creates the error, run traceback() and give the output here

ksjewell commented 2 years ago

Wasn't sure if I should comment a closed issue or start a new one... We get a similar problem. Maybe we can replicate it this time.

We are getting the error when uploading a list through docs_bulk.

docs_bulk(con, test, index = "g2_dbas_test_kevin") |=====================================================================================================================================================| 100%Fehler in curl::curl_fetch_memory(x$url$url, handle = x$url$handle) : Send failure: Datenübergabe unterbrochen (broken pipe) Zusätzlich: Warnmeldung: In sprintf(metadata_fmt, action, index, counter) : one argument not used by format '{"%s":{"_index":"%s"}}'

We always got the broken pipe error but it uploaded anyway. Now we also get the "one argument" warning as well and the upload does not occur. Not sure if the error is in ElasticSearch or how we are using the package. I tried to make a minimal example:

This is the list I am uploading test <- list( list( mz = 111.1111, rt = 12.1 ), list( mz = 222.2222, rt = 13.1 ) )

This is the index: PUT /g2_dbas_test_kevin { "mappings" : { "properties" : { "mz" : {"type" : "float"}, "rt" : {"type" : "float"} } } }

ElasticSearch version: `con$ping()$version $number [1] "7.11.2"

$build_flavor [1] "default"

$build_type [1] "rpm"

$build_hash [1] "3e5a16cfec50876d20ea77b075070932c6464c7d"

$build_date [1] "2021-03-06T05:54:38.141101Z"

$build_snapshot [1] FALSE

$lucene_version [1] "8.7.0"

$minimum_wire_compatibility_version [1] "6.8.0"

$minimum_index_compatibility_version [1] "6.0.0-beta1"`

Session info:

`> devtools::session_info() ─ Session info ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── setting value version R version 4.1.0 (2021-05-18) os CentOS Linux 7 (Core) system x86_64, linux-gnu ui RStudio language (EN) collate de_DE.UTF-8 ctype de_DE.UTF-8 tz Europe/Berlin date 2021-08-18

─ Packages ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── package version date lib source assertthat 0.2.1 2019-03-21 [2] CRAN (R 4.1.0) Biobase 2.52.0 2021-05-19 [2] Bioconductor BiocGenerics 0.38.0 2021-05-19 [2] Bioconductor bit 4.0.4 2020-08-04 [2] CRAN (R 4.1.0) bit64 4.0.5 2020-08-30 [2] CRAN (R 4.1.0) blob 1.2.2 2021-07-23 [2] CRAN (R 4.1.0) cachem 1.0.5 2021-05-15 [2] CRAN (R 4.1.0) callr 3.7.0 2021-04-20 [2] CRAN (R 4.1.0) cli 3.0.1 2021-07-17 [2] CRAN (R 4.1.0) codetools 0.2-18 2020-11-04 [2] CRAN (R 4.1.0) colorspace 2.0-2 2021-06-24 [2] CRAN (R 4.1.0) config 0.3.1 2020-12-17 [2] CRAN (R 4.1.0) crayon 1.4.1 2021-02-08 [2] CRAN (R 4.1.0) crul 1.1.0 2021-02-15 [2] CRAN (R 4.1.0) curl 4.3.2 2021-06-23 [2] CRAN (R 4.1.0) DBI 1.1.1 2021-01-15 [2] CRAN (R 4.1.0) desc 1.3.0 2021-03-05 [2] CRAN (R 4.1.0) devtools 2.4.2 2021-06-07 [2] CRAN (R 4.1.0) digest 0.6.27 2020-10-24 [2] CRAN (R 4.1.0) dplyr 1.0.7 2021-06-18 [2] CRAN (R 4.1.0) elastic 1.2.0 2021-03-16 [2] CRAN (R 4.1.0) ellipsis 0.3.2 2021-04-29 [2] CRAN (R 4.1.0) fansi 0.5.0 2021-05-25 [2] CRAN (R 4.1.0) fastmap 1.1.0 2021-01-25 [2] CRAN (R 4.1.0) foreach 1.5.1 2020-10-15 [2] CRAN (R 4.1.0) fs 1.5.0 2020-07-31 [2] CRAN (R 4.1.0) generics 0.1.0 2020-10-31 [2] CRAN (R 4.1.0) ggplot2 3.3.5 2021-06-25 [2] CRAN (R 4.1.0) glue 1.4.2 2020-08-27 [2] CRAN (R 4.1.0) gtable 0.3.0 2019-03-25 [2] CRAN (R 4.1.0) htmltools 0.5.1.1 2021-01-22 [2] CRAN (R 4.1.0) httpcode 0.3.0 2020-04-10 [2] CRAN (R 4.1.0) httpuv 1.6.1 2021-05-07 [2] CRAN (R 4.1.0) iterators 1.0.13 2020-10-15 [2] CRAN (R 4.1.0) jsonlite 1.7.2 2020-12-09 [2] CRAN (R 4.1.0) later 1.2.0 2021-04-23 [2] CRAN (R 4.1.0) lifecycle 1.0.0 2021-02-15 [2] CRAN (R 4.1.0) lubridate 1.7.10 2021-02-26 [2] CRAN (R 4.1.0) magrittr 2.0.1 2020-11-17 [2] CRAN (R 4.1.0) memoise 2.0.0 2021-01-26 [2] CRAN (R 4.1.0) mime 0.11 2021-06-23 [2] CRAN (R 4.1.0) munsell 0.5.0 2018-06-12 [2] CRAN (R 4.1.0) mzR 2.26.1 2021-06-20 [2] Bioconductor ncdf4 1.17 2019-10-23 [2] CRAN (R 4.1.0) ntsworkflow * 0.2.1 2021-07-20 [1] local pillar 1.6.2 2021-07-29 [2] CRAN (R 4.1.0) pkgbuild 1.2.0 2020-12-15 [2] CRAN (R 4.1.0) pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 4.1.0) pkgload 1.2.1 2021-04-06 [2] CRAN (R 4.1.0) prettyunits 1.1.1 2020-01-24 [2] CRAN (R 4.1.0) processx 3.5.2 2021-04-30 [2] CRAN (R 4.1.0) promises 1.2.0.1 2021-02-11 [2] CRAN (R 4.1.0) ProtGenerics 1.24.0 2021-05-19 [2] Bioconductor ps 1.6.0 2021-02-28 [2] CRAN (R 4.1.0) purrr 0.3.4 2020-04-17 [2] CRAN (R 4.1.0) R6 2.5.0 2020-10-28 [2] CRAN (R 4.1.0) Rcpp 1.0.7 2021-07-07 [2] CRAN (R 4.1.0) remotes 2.4.0 2021-06-02 [2] CRAN (R 4.1.0) rlang 0.4.11 2021-04-30 [2] CRAN (R 4.1.0) rprojroot 2.0.2 2020-11-15 [2] CRAN (R 4.1.0) RSQLite 2.2.7 2021-04-22 [2] CRAN (R 4.1.0) rstudioapi 0.13 2020-11-12 [2] CRAN (R 4.1.0) scales 1.1.1 2020-05-11 [2] CRAN (R 4.1.0) sessioninfo 1.1.1 2018-11-05 [2] CRAN (R 4.1.0) shiny 1.6.0 2021-01-25 [2] CRAN (R 4.1.0) stringi 1.7.3 2021-07-16 [2] CRAN (R 4.1.0) stringr 1.4.0 2019-02-10 [2] CRAN (R 4.1.0) testthat 3.0.4 2021-07-01 [2] CRAN (R 4.1.0) tibble 3.1.3 2021-07-23 [2] CRAN (R 4.1.0) tidyr 1.1.3 2021-03-03 [2] CRAN (R 4.1.0) tidyselect 1.1.1 2021-04-30 [2] CRAN (R 4.1.0) triebeard 0.3.0 2016-08-04 [2] CRAN (R 4.1.0) urltools 1.7.3 2019-04-14 [2] CRAN (R 4.1.0) usethis 2.0.1 2021-02-10 [2] CRAN (R 4.1.0) utf8 1.2.2 2021-07-24 [2] CRAN (R 4.1.0) vctrs 0.3.8 2021-04-29 [2] CRAN (R 4.1.0) withr 2.4.2 2021-04-18 [2] CRAN (R 4.1.0) xtable 1.8-4 2019-04-21 [2] CRAN (R 4.1.0) yaml 2.2.1 2020-02-01 [2] CRAN (R 4.1.0)

`

sckott commented 2 years ago

@ksjewell sorry i'm no longer at the job where i maintained this pkg, if you are interested or know of anyone interested in maintaining this pkg, let me know

ksjewell commented 2 years ago

@sckott This package has become mission-critical for us at the Federal Institute of Hydrology. We are very thankful fo it! I would certainly be interested in supporting it however I can. I have built several packages in R, one of them heavily dependant on this package, but never posted anything on CRAN.

sckott commented 2 years ago

@ksjewell Great to know its an important package! I wasn't aware of that. Sound like you might be interested in helping maintain this package? Think about it and let me know. Perhaps you can gather some friends that can also help to make it a team effort.

@iainmwallace perhaps you'd be interested in helping maintain as well?

ksjewell commented 2 years ago

@sckott yes myself and two colleagues would be willing to help if we can.