ropensci / elastic

R client for the Elasticsearch HTTP API
https://docs.ropensci.org/elastic
Other
245 stars 58 forks source link

Search in over 10000 size #245

Closed MonaxGT closed 5 years ago

MonaxGT commented 5 years ago

Hi Guys! I want to thank you for your package!

I have a question, when i try to search in "big" index over 30 days in Kibana i get about 2m documents, but in your package maximum 10000. If i change size option more 10000 i saw error like "Error: 500 - all shards failed". How i can fix this limit? I understand that this limit database and may be i should change my q option but in this situation can't.

May be i can use Search in some For cycle ?

sckott commented 5 years ago

thanks for your question.

can you try again with connect(errors = "complete") and it should give you the Elasticsearch stack trace and report that back here

MonaxGT commented 5 years ago

Hi sckott,

Error: 500 - all shards failed
ES stack trace:

  type: query_phase_execution_exception
  reason: Result window is too large, from + size must be less than or equal to: [10000] but was [100000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.
  type: query_phase_execution_exception
  reason: Result window is too large, from + size must be less than or equal to: [10000] but was [100000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.
  type: query_phase_execution_exception
  reason: Result window is too large, from + size must be less than or equal to: [10000] but was [100000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.
  type: query_phase_execution_exception
MonaxGT commented 5 years ago

Hi, if i right undestand i should use scroll api like request, i see in your library option with scroll, but i don't understand how i can use this in cycle....

MonaxGT commented 5 years ago

I find) but now i try find how use this data with raw option

res <- Search(index = 'shakespeare', q="a*", time_scroll="5m",
  body = '{"sort": ["_doc"]}')
out <- list()
hits <- 1
while(hits != 0){
  res <- scroll(res$`_scroll_id`)
  hits <- length(res$hits$hits)
  if(hits > 0)
    out <- c(out, res$hits$hits)
}
length(out)
out[[1]]
sckott commented 5 years ago

so your question is how to use scroll() with raw = TRUE ?

sckott commented 5 years ago

you will probably need to parse the raw JSON yourself to get the scroll id to pass to the next scroll() call in the while loop

sckott commented 5 years ago

closing due to inactivity