ngageoint / elasticgeo

ElasticGeo provides a GeoTools data store that allows geospatial features from an Elasticsearch index to be published via OGC services using GeoServer.
GNU General Public License v3.0
169 stars 86 forks source link

Can elasticgeo support sliced scroll-scan? #108

Open itakyubi opened 5 years ago

sjudeng commented 5 years ago

Hello. No it's not currently supported.

halfstein commented 4 years ago

Hi @sjudeng, I work with @johndeereguy and have some updates to ElasticFeatureReaderScroll and ElasticFeatureSource I'd like to prepare for a pull request. It adds support for a true ES Scroll so seems like it could apply to this issue (although not sliced). I have the changes in our version split from 2.12.2.

Would this be helpful? And if so, which branch should I work to initially add changes to?

Here's javadoc added to ElasticFeatureSource:

For large feature sets, this source can either scroll or page the results. Although the {@link ElasticFeatureReaderScroll} uses an ES scroll either way, scrolling and paging are very different operations.

Scrolled results are automatically returned to a single client request when the Query selects more results than the {@link ElasticDataStore#getScrollSize()}. This allows getting more records from ES than the default 10k limit but is still capped by the layer per-request max feature limit.

Paged results are returned to a sequence of client requests, activated by using the WFS 2.0 STARTINDEX and COUNT parameters. This allows getting more results from a layer than the per-request max features limit (in COUNT size chunks). The COUNT provided by the client overrides the scroll size, so it is the client's responsibility to use a size below 10k (or the ES limit). Also, paged results must begin with STARTINDEX=0 and must advance through STARTINDEXs in constant COUNT intervals. That is, the pages can not be reversed, skipped or randomly accessed, and page size can not be changed.

NOTE: To support paged results, the {@link ElasticFeatureReaderScroll} is cached in the users {@link HttpSession}. A {@link CompletableFuture} is also created and asynchronously run to close that scroll if the client does not read subsequent pages.

NOTE: This gives the impression that we could work in a multi-GeoServer environment, but almost certainly the scroll and the timeout would not serialize and migrate with the session.

sjudeng commented 4 years ago

@halfstein I think this would be a great feature to add to the project. The documentation is great to see as well. Thanks for taking the time to contribute this back.

You can target the master branch in this project. There's also a branch in my separate GeoTools fork here but I'd like to keep this project maintained until that's all merged and released. You're welcome to separately open a PR against the GeoTools branch and maybe we can get it included as part of that merge but I can also handle that later myself if that's inconvenient (the structure is very different).

Thanks again for reaching out and for your work on this feature.

halfstein commented 4 years ago

@sjudeng great ... I'll start working on it.