ropensci / elastic

R client for the Elasticsearch HTTP API
https://docs.ropensci.org/elastic
Other
244 stars 58 forks source link

Error: 404 - Client error: (404) Not Found while scrolling #178

Closed kickbox closed 7 years ago

kickbox commented 7 years ago
 devtools::session_info()
Session info ---------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.3.1 (2016-06-21)
 system   x86_64, mingw32             
 ui       RStudio (1.0.136)           
 language (EN)                        
 collate  English_India.1252          
 tz       Asia/Calcutta               
 date     2017-05-30                  

Packages -------------------------------------------------------------------------------------------
 package        * version date       source        
 assertthat       0.1     2013-12-06 CRAN (R 3.2.2)
 base64enc        0.1-3   2015-07-28 CRAN (R 3.2.3)
 colorspace       1.3-2   2016-12-14 CRAN (R 3.3.2)
 curl             2.3     2016-11-24 CRAN (R 3.3.2)
 data.table     * 1.10.4  2017-02-01 CRAN (R 3.3.2)
 DBI            * 0.5-1   2016-09-10 CRAN (R 3.3.2)
 devtools         1.12.0  2016-06-24 CRAN (R 3.3.1)
 digest           0.6.12  2017-01-27 CRAN (R 3.3.2)
 dplyr          * 0.5.0   2016-06-24 CRAN (R 3.3.1)
 DT             * 0.2     2016-08-09 CRAN (R 3.3.2)
 elastic        * 0.7.8   2016-11-09 CRAN (R 3.3.3)
 ggplot2        * 2.2.1   2016-12-30 CRAN (R 3.3.2)
 ggthemes       * 3.4.0   2017-02-19 CRAN (R 3.3.3)
 gridBase       * 0.4-7   2014-02-24 CRAN (R 3.2.2)
 gtable           0.2.0   2016-02-26 CRAN (R 3.3.1)
 htmltools        0.3.5   2016-03-21 CRAN (R 3.3.1)
 htmlwidgets      0.8     2016-11-09 CRAN (R 3.3.2)
 httpuv           1.3.3   2015-08-04 CRAN (R 3.2.2)
 httr           * 1.2.1   2016-07-03 CRAN (R 3.3.2)
 jqr            * 0.2.4   2016-07-29 CRAN (R 3.3.2)
 jsonlite       * 1.3     2017-02-28 CRAN (R 3.3.3)
 lattice          0.20-34 2016-09-06 CRAN (R 3.3.2)
 lazyeval         0.2.0   2016-06-12 CRAN (R 3.3.1)
 lubridate      * 1.6.0   2016-09-13 CRAN (R 3.3.2)
 magrittr         1.5     2014-11-22 CRAN (R 3.2.2)
 markdown       * 0.7.7   2015-04-22 CRAN (R 3.2.2)
 memoise          1.0.0   2016-01-29 CRAN (R 3.3.1)
 mime             0.5     2016-07-07 CRAN (R 3.3.2)
 miniUI           0.1.1   2016-01-15 CRAN (R 3.3.2)
 munsell          0.4.3   2016-02-13 CRAN (R 3.3.1)
 plotly         * 4.5.6   2016-11-12 CRAN (R 3.3.2)
 plyr           * 1.8.4   2016-06-08 CRAN (R 3.3.1)
 purrr            0.2.2   2016-06-18 CRAN (R 3.3.1)
 R6               2.2.0   2016-10-05 CRAN (R 3.3.2)
 Rcpp             0.12.10 2017-03-19 CRAN (R 3.3.3)
 reshape2       * 1.4.2   2016-10-22 CRAN (R 3.3.2)
 rjson          * 0.2.15  2014-11-03 CRAN (R 3.2.2)
 RMySQL         * 0.10.9  2016-05-08 CRAN (R 3.2.5)
 scales         * 0.4.1   2016-11-09 CRAN (R 3.3.2)
 shiny          * 1.0.0   2017-01-12 CRAN (R 3.3.2)
 shinydashboard * 0.5.3   2016-09-20 CRAN (R 3.3.2)
 shinyjs        * 0.9     2016-12-26 CRAN (R 3.3.2)
 stringi        * 1.1.2   2016-10-01 CRAN (R 3.3.2)
 stringr        * 1.2.0   2017-02-18 CRAN (R 3.3.3)
 tibble           1.2     2016-08-26 CRAN (R 3.3.2)
 tidyjson       * 0.2.2   2017-04-21 CRAN (R 3.3.3)
 tidyr            0.6.1   2017-01-10 CRAN (R 3.3.2)
 viridisLite      0.1.3   2016-03-12 CRAN (R 3.2.5)
 withr            1.0.2   2016-06-20 CRAN (R 3.3.1)
 xtable           1.8-2   2016-02-05 CRAN (R 3.3.1)
 zoo            * 1.7-14  2016-12-16 CRAN (R 3.3.2)

First scroll works well. I get the following error on the second run.

Error: 404 - Client error: (404) Not Found

My code adapted from the nice scrolling example in your documentation.

> res <- scroll(scroll_id = q$`_scroll_id`, config=c(progress()), raw=T)
  |==========================================================================================| 100%
Error: 404 - Client error: (404) Not Found

My question

  1. Why do I get this error even when I have set scroll= "1h" or even "30m", "20m" in "Search"

Scroll does not return any object in case of error. If there is an error like this I would like to check the return value and stop processing.

  1. Can you return an error value when there is an error like this?
sckott commented 7 years ago

@kickbox DO you know about verbose errors? See errors param in connect()

kickbox commented 7 years ago

@sckott here is the result with verbose() and witherrors="complete" in connect()

> res <- scroll(scroll_id = q$`_scroll_id`, config=c(progress(),verbose()), raw=T)
-> POST /_search/scroll?scroll=1m HTTP/1.1
-> Host: xxx
-> Authorization: Basic xxx
-> User-Agent: libcurl/7.51.0 r-curl/2.3 httr/1.2.1
-> Accept-Encoding: gzip, deflate
-> Accept: application/json, text/xml, application/xml, */*
-> Content-Length: 236
-> 
>> $$28$$$$YxgBfq58WeYOJfMf67QZSPFNXxY=c2Nhbjs0OzQ2MTA3Nzo2cDUyZ3JBX1JRQ0NfYTJ4Ny1JX0h3OzU2MTQ1MTE6QXNjMHVuem5Sci1MS1VHVFRJb2ZuQTs1NjE0NTEyOkFzYzB1bnpuUnItTEtVR1RUSW9mbkE7MTU3NjI5NzpJR2JMVVVhZ1JfQzJtSThabjJ0MHRROzE7dG90YWxfaGl0czozNjExODs=

<- HTTP/1.1 404 Not Found
<- Content-Type: application/json; charset=UTF-8
<- Content-Length: 749
<- 
  |========================================================================================================| 100%
Error: 404 - error
ES stack trace:

  _scroll_id: $$28$$$$JY0PW6q3N3iX_gBjE2UQCiB517c=c2NhbjswOzE7dG90YWxfaGl0czozNjExODs=
  took: 5
  timed_out: FALSE
  _shards.total: 4
  _shards.successful: 0
  _shards.failed: 4
  _shards.failures.shard: -1
  _shards.failures.reason.type: search_context_missing_exception
  _shards.failures.reason.reason: No search context found for id [461077]
  _shards.failures.shard: -1
  _shards.failures.reason.type: search_context_missing_exception
  _shards.failures.reason.reason: No search context found for id [5614511]
  _shards.failures.shard: -1
  _shards.failures.reason.type: search_context_missing_exception
  _shards.failures.reason.reason: No search context found for id [5614512]
  _shards.failures.shard: -1
  _shards.failures.reason.type: search_context_missing_exception
  _shards.failures.reason.reason: No search context found for id [1576297]
  hits.total: 36118
  hits.max_score: 0
sckott commented 7 years ago

@kickbox this thread seems very relavant https://discuss.elastic.co/t/searchcontextmissingexception-during-long-scroll-scan-operations/23775 - are you using the same scroll id for each request? you need to use the scroll id returned from request 1 in the next request (aka, request 2), and so on

kickbox commented 7 years ago

@sckott thanks. But I am not reusing the old scroll ID. The scroll ID for the second request is

"$$28$$$$7mxCMQEIJ2_YNqJuZWxCcoytpLc=c2Nhbjs0OzQ4NzcxNjo2cDUyZ3JBX1JRQ0NfYTJ4Ny1JX0h3OzU2NTA5OTY6QXNjMHVuem5Sci1MS1VHVFRJb2ZuQTs1NjUwOTk3OkFzYzB1bnpuUnItTEtVR1RUSW9mbkE7MzE4MzEzNzpjOEpiYUU1VlJiQ2tKemQxRGwxeTRBOzE7dG90YWxfaGl0czozNjIwNTs="

However I think this doesn't match with the scroll request from R, though I specified the same value via scroll(). Please see the entire logic below, the request scroll ID seems to be different from above

_scroll_id: $$28$$$$CmZQSMcbrrZ5CE9SfuefyJjQ-wc=c2NhbjswOzE7dG90YWxfaGl0czozNjIwNTs=

`> q$`_scroll_id`
[1] "$$28$$$$7mxCMQEIJ2_YNqJuZWxCcoytpLc=c2Nhbjs0OzQ4NzcxNjo2cDUyZ3JBX1JRQ0NfYTJ4Ny1JX0h3OzU2NTA5OTY6QXNjMHVuem5Sci1MS1VHVFRJb2ZuQTs1NjUwOTk3OkFzYzB1bnpuUnItTEtVR1RUSW9mbkE7MzE4MzEzNzpjOEpiYUU1VlJiQ2tKemQxRGwxeTRBOzE7dG90YWxfaGl0czozNjIwNTs="
> scrollId <- q$`_scroll_id`
> res <- scroll(scroll_id = scrollId, config=c(progress(),verbose()), raw=T)
-> POST /_search/scroll?scroll=1m HTTP/1.1
-> Host: xxx
-> Authorization: Basic xxx
-> User-Agent: libcurl/7.51.0 r-curl/2.3 httr/1.2.1
-> Accept-Encoding: gzip, deflate
-> Accept: application/json, text/xml, application/xml, */*
-> Content-Length: 236
-> 
>> $$28$$$$7mxCMQEIJ2_YNqJuZWxCcoytpLc=c2Nhbjs0OzQ4NzcxNjo2cDUyZ3JBX1JRQ0NfYTJ4Ny1JX0h3OzU2NTA5OTY6QXNjMHVuem5Sci1MS1VHVFRJb2ZuQTs1NjUwOTk3OkFzYzB1bnpuUnItTEtVR1RUSW9mbkE7MzE4MzEzNzpjOEpiYUU1VlJiQ2tKemQxRGwxeTRBOzE7dG90YWxfaGl0czozNjIwNTs=

<- HTTP/1.1 404 Not Found
<- Content-Type: application/json; charset=UTF-8
<- Content-Length: 749
<- 
  |========================================================================================================| 100%
Error: 404 - error
ES stack trace:

  _scroll_id: $$28$$$$CmZQSMcbrrZ5CE9SfuefyJjQ-wc=c2NhbjswOzE7dG90YWxfaGl0czozNjIwNTs=
  took: 6
  timed_out: FALSE
  _shards.total: 4
  _shards.successful: 0
  _shards.failed: 4
  _shards.failures.shard: -1
  _shards.failures.reason.type: search_context_missing_exception
  _shards.failures.reason.reason: No search context found for id [487716]
  _shards.failures.shard: -1
  _shards.failures.reason.type: search_context_missing_exception
  _shards.failures.reason.reason: No search context found for id [5650996]
  _shards.failures.shard: -1
  _shards.failures.reason.type: search_context_missing_exception
  _shards.failures.reason.reason: No search context found for id [5650997]
  _shards.failures.shard: -1
  _shards.failures.reason.type: search_context_missing_exception
  _shards.failures.reason.reason: No search context found for id [3183137]
  hits.total: 36205
  hits.max_score: 0
> `
kickbox commented 7 years ago

@sckott I think I have found the solution. I had to specify scroll_time not only on the initial Search() but also on the subsequent scroll() too. This fixes this. Thanks.

sckott commented 7 years ago

Ah, so scroll time isn't being carried over - maybe we can carry it over somehow - but allow user to override it if they desire with setting a new scroll time when calling scroll()

thoughts?

kickbox commented 7 years ago

My thoughts :) That could be a way to go. But as a package-design-choice I would suggest to keep the same defaults as the official elasticsearch client api, to be consistent with your other defaults.

I fell into this problem by blindly following the example in the documentation of your package. So may be this behaviour can be explicitly mentioned in the "scroll" example. That could be one way to prevent this..