ropensci / elastic

R client for the Elasticsearch HTTP API
https://docs.ropensci.org/elastic
Other
245 stars 58 forks source link

Scroll returns hits with scroll id before scrolling #229

Closed Jensxy closed 5 years ago

Jensxy commented 5 years ago

Scroll is returning hits on initial search, where I was expecting it to only return a _scroll_id.

When I execute res <- elastic::Search(index = "my_index, body=body, time_scroll = "3m", size = 1000) I expect length(res$hits$hits) to be zero. However, the result is not zero after the package update.

Before the update, length(res$hits$hits) was zero and I had to scroll first to get hits. See the following code.

scrollID <- res$`_scroll_id`
res    <- scroll(scroll_id = scrollID)
length(res$hits$hits)

What can I do so that scroll is not returning hits on initial search? Is that even possible? Otherwise I have to rewrite my complete code.

sckott commented 5 years ago

thx for the issue @Jensxy

if you try the examples in the elasticsearch docs https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html you do get hits on the initial call with _search (same as what's called in elastic::Search()). So I think that's what's supposed to happen.

What can I do so that scroll is not returning hits on initial search?

i don't think it's possible. I'm not sure why you'd want this?

Jensxy commented 5 years ago

I'm not sure why you'd want this?

No hits were returned on initial search by default by ES 2.4 and the old elastic package. I upgraded my ES version to 6.3.

I know that the default settings have changed. I thought that there might be a way to get the same no hits on initial search in ES 6.3 as well.

And the following example from the package does not work completely.

# Get all results - one approach is to use a while loop
res <- Search(index = 'shakespeare', q="a*", time_scroll="5m",
  body = '{"sort": ["_doc"]}')
out <- list()
hits <- 1
while(hits != 0){
  res <- scroll(res$`_scroll_id`)
  hits <- length(res$hits$hits)
  if(hits > 0)
    out <- c(out, res$hits$hits)
}
length(out)
out[[1]]

You won't get all results when you are using this example. The hits from the initial search are not included in out. So, the example has to be fixed.

EDIT::

Okay, I see this example is fixed in the new version. Is v0.9 already available?

sckott commented 5 years ago

@Jensxy sorry, v0.9 isn't there yet, you can follow progress on the v0.9 milestone though https://github.com/ropensci/elastic/milestone/8

you can install from github to get the latest though, so you don't have to wait for new cran version.

sckott commented 5 years ago

closing for now since this seems sorted in dev version