stac-utils / stac-fastapi-elasticsearch-opensearch

Elasticsearch backend for stac-fastapi with Opensearch support.
https://stac-utils.github.io/stac-fastapi-elasticsearch-opensearch
MIT License
28 stars 13 forks source link

Paginated search queries always returns a next token even if it's the last page of the pagination. #242

Closed pedro-cf closed 4 months ago

pedro-cf commented 5 months ago

Describe the bug Paginated search queries always returns a next token even if it's the last page of the pagination.

To Reproduce Steps to reproduce the behavior:

  1. Run app docker-compose up
  2. Create test collection
  3. Creation 1 test item
  4. Run search query with "limit":1 -> Response will contain an "next link"

Expected behavior I assume the last page of the pagination should not include a "next link"

jonhealy1 commented 5 months ago

It's true. If there are no more items, there shouldn't be a next link.

pedro-cf commented 5 months ago

https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/blob/f797ed4f449a40c52337ba1c451a597aa8a3f2e7/stac_fastapi/elasticsearch/stac_fastapi/elasticsearch/database_logic.py#L524-L604

Issue is here, but I'm not quite sure how to fix this, cause I'm not so familiar with elasticsearch.

pedro-cf commented 5 months ago

I can't seem to figure out a good way to obtain the "page" number in elasticsearch.

We could pass a page number inside the token, but this would cause issues in other places sadly:

my example:

    async def execute_search(
...
    ) -> Tuple[Iterable[Dict[str, Any]], Optional[int], Optional[str]]:
        """
...
        """
        search_after = None
        page = 1

        if token:
            decoded_token = urlsafe_b64decode(token.encode()).decode().split(",")
            search_after = decoded_token[:-1]
            page = int(decoded_token[-1]) + 1

        query = search.query.to_dict() if search.query else None

        index_param = indices(collection_ids)

        search_task = asyncio.create_task(
            self.client.search(
                index=index_param,
                ignore_unavailable=ignore_unavailable,
                query=query,
                sort=sort or DEFAULT_SORT,
                search_after=search_after,
                size=limit,
            )
        )

        try:
            es_response = await search_task
        except exceptions.NotFoundError:
            raise NotFoundError(f"Collections '{collection_ids}' do not exist")

        matched = es_response["hits"]["total"]["value"]
        hits = es_response["hits"]["hits"]
        items = (hit["_source"] for hit in hits)

        next_token = None
        if matched > page * limit:
            if hits and (sort_array := hits[-1].get("sort")):
                next_token = urlsafe_b64encode(
                    ",".join([str(x) for x in sort_array] + [str(page)]).encode()
                ).decode()

        return items, matched, next_token

Note that this version of this function also makes use of es_response["hits"]["total"]["value"] which from what I understand represents the number of matches which would be a great replacement for: https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/blob/f797ed4f449a40c52337ba1c451a597aa8a3f2e7/stac_fastapi/elasticsearch/stac_fastapi/elasticsearch/database_logic.py#L573-L579 https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/blob/f797ed4f449a40c52337ba1c451a597aa8a3f2e7/stac_fastapi/elasticsearch/stac_fastapi/elasticsearch/database_logic.py#L597-L604

jonhealy1 commented 4 months ago

I added another test related to this here: https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/pull/244