Open alexashley opened 3 years ago
Used this script to bulk load 15,000 occurrences and try to page through them:
On the 11th page, this error is returned from Elasticsearch:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Result window is too large, from + size must be less than or equal to: [10000] but was [11000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "grafeas-v1beta2-rode-occurrences",
"node": "y40fPpNDRm648olC-Ut-tA",
"reason": {
"type": "illegal_argument_exception",
"reason": "Result window is too large, from + size must be less than or equal to: [10000] but was [11000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."
}
}
],
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Result window is too large, from + size must be less than or equal to: [10000] but was [11000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Result window is too large, from + size must be less than or equal to: [10000] but was [11000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."
}
}
},
"status": 400
}
Came out of this discussion.
The documentation that Elasticsearch provides on pagination makes it sound like there is a hard cap on the number of results than can be paged through using
from
andsize
:We need to determine if that's the case or not by loading a number of notes or occurrences greater than
index.max_result_window
and attempting to page through them.If it is, we'll need to make some changes to grab the
sort
value from the lasthit
in the results, encode that in the page token, and send it along in future requests as thesearch_after
parameter.