stac-utils / stac-fastapi-elasticsearch-opensearch

Elasticsearch backend for stac-fastapi with Opensearch support.
https://stac-utils.github.io/stac-fastapi-elasticsearch-opensearch
MIT License
24 stars 14 forks source link

BUG with links in version 2.3.0 #235

Closed pedro-cf closed 2 months ago

pedro-cf commented 2 months ago

Describe the bug In version 2.1.0 of stac-fastapi.core when creating an item this item would automatically be assigned links (although one could not add aditional links) Example:

"links": [
        {
            "rel": "self",
            "type": "application/geo+json",
            "href": "http://localhost:8080/collections/test-collection/items/test-item96"
        },
        {
            "rel": "parent",
            "type": "application/json",
            "href": "http://localhost:8080/collections/test-collection"
        },
        {
            "rel": "collection",
            "type": "application/json",
            "href": "http://localhost:8080/collections/test-collection"
        },
        {
            "rel": "root",
            "type": "application/json",
            "href": "http://localhost:8080/"
        }
    ]

But in version 2.3.0 when creating an item links is empty unless the user passes aditional links.

Passing the self root collection parent links has no effect aswell. They are ignored.

To Reproduce Steps to reproduce the behavior: Run versions 2.1.0 and 2.3.0 of stac-fastapi-elasticsearch and create 1 test collection and 1 item collection in both environments without providing links.

Expected behavior Creating an item should auto generate links.

Screenshots image

image image

Desktop:

pedro-cf commented 2 months ago

Guessing it's related to: https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/blob/c8a07aed3a8734e65861f69ee946cee272061fbe/stac_fastapi/core/stac_fastapi/core/serializers.py#L53-L70

and this new function from stac-fastapi: https://github.com/stac-utils/stac-fastapi/blob/e7f82d6996af0f28574329d57f5a5e90431d66bb/stac_fastapi/types/stac_fastapi/types/links.py#L21-L26

def resolve_links(links: list, base_url: str) -> List[Dict]:
    """Convert relative links to absolute links."""
    filtered_links = filter_links(links)
    for link in filtered_links:
        link.update({"href": urljoin(base_url, link["href"])})
    return filtered_links
pedro-cf commented 2 months ago

I believe I found the issue.

Version 2.1.0 of ItemSerializer.stac_to_db:

class ItemSerializer(Serializer):
    """Serialization methods for STAC items."""

    @classmethod
    def stac_to_db(cls, stac_data: stac_types.Item, base_url: str) -> stac_types.Item:
        """Transform STAC item to database-ready STAC item.

        Args:
            stac_data (stac_types.Item): The STAC item object to be transformed.
            base_url (str): The base URL for the STAC API.

        Returns:
            stac_types.Item: The database-ready STAC item object.
        """
        item_links = ItemLinks(
            collection_id=stac_data["collection"],
            item_id=stac_data["id"],
            base_url=base_url,
        ).create_links()
        stac_data["links"] = item_links

        now = now_to_rfc3339_str()
        if "created" not in stac_data["properties"]:
            stac_data["properties"]["created"] = now
        stac_data["properties"]["updated"] = now
        return stac_data

In this version any custom item links passed by stac_data are simply ignored and "base links" (self/parent/collection/root) are generated.

Version 2.3.0 of ItemSerializer.stac_to_db: https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/blob/c8a07aed3a8734e65861f69ee946cee272061fbe/stac_fastapi/core/stac_fastapi/core/serializers.py#L49-L70

https://github.com/stac-utils/stac-fastapi/blob/e7f82d6996af0f28574329d57f5a5e90431d66bb/stac_fastapi/types/stac_fastapi/types/links.py#L21-L26

INFERRED_LINK_RELS = ["self", "item", "parent", "collection", "root", "items"]

def filter_links(links: List[Dict]) -> List[Dict]:
    """Remove inferred links."""
    return [link for link in links if link["rel"] not in INFERRED_LINK_RELS]

def resolve_links(links: list, base_url: str) -> List[Dict]:
    """Convert relative links to absolute links."""
    filtered_links = filter_links(links)
    for link in filtered_links:
        link.update({"href": urljoin(base_url, link["href"])})
    return filtered_links

In this version the "base links" are no longer generated and the custom links are added and "resolved"...

This is a possible solution:

def stac_to_db(cls, stac_data: stac_types.Item, base_url: str) -> stac_types.Item:
    item_links = resolve_links(stac_data.get("links", []), base_url)

    base_item_links = ItemLinks(
        collection_id=stac_data["collection"],
        item_id=stac_data["id"],
        base_url=base_url,
    ).create_links()

    stac_data["links"] = base_item_links + item_links

    now = now_to_rfc3339_str()
    if "created" not in stac_data["properties"]:
        stac_data["properties"]["created"] = now
    stac_data["properties"]["updated"] = now
    return stac_data
jonhealy1 commented 2 months ago

Hi Pedro. Thanks for identifying this issue! Would you be able to make a pr to fix this?

pedro-cf commented 2 months ago

Hi Pedro. Thanks for identifying this issue! Would you be able to make a pr to fix this?

I'm not sure why this function was changed this way before, so I just want to make sure this fix makes sense.

pedro-cf commented 2 months ago

Only noticed this now:

https://github.com/stac-utils/stac-fastapi/blob/e7f82d6996af0f28574329d57f5a5e90431d66bb/stac_fastapi/types/stac_fastapi/types/links.py#L10-14 image

So the endpoints that return data are intended to generate these links ["self", "item", "parent", "collection", "root"] ? Guessing this affects any CREATE & UPDATE aswell as READS since they return the actual item when you create/update item.

jonhealy1 commented 2 months ago

Yes I think you're right. Trying to refresh my memory here too. When you create an item it should only store links that aren't in the INFERRED_LINK_RELS list. Then these links should be created when an item or collection is returned.

If I create an item in main branch right now, it only stores links not in INFERRED_LINK_RELS and then when I GET that item, I can see all the links created dynamically plus the additional links stored in the db.

I think one issue here is that if you POST a new item it returns a version of the item that is not the same as the version of the item you will get from a GET item request. POSTing an item will return a version without the links generated. This makes it look like the item is missing links when you create an item. This should be fixed I think.

pedro-cf commented 2 months ago

I think one issue here is that if you POST a new item it returns a version of the item that is not the same as the version of the item you will get from a GET item request. POSTing an item will return a version without the links generated. This makes it look like the item is missing links when you create an item. This should be fixed I think.

When we CREATE/UPDATE a collection, the collection returned in the request response contains links and the collection links are also not stored in the database.

Any idea how we can fix this? Where is the piece of code that controls the response from the CREATE/UPDATE of items? @jonhealy1

jonhealy1 commented 2 months ago

So items and collections are behaving differently. I have to dig into this to remember.

jonhealy1 commented 2 months ago

This is the create collection function in core - it looks like it's using collection serializer. If you look in create_item above, it's using prep_create_item.

jonhealy1 commented 2 months ago

Most of the logic is here in serializers - https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/blob/main/stac_fastapi/core/stac_fastapi/core/serializers.py

pedro-cf commented 2 months ago

Ok the issue seems to be that in the this stac-fastapi.core.TransactionsClient.create_item the return is the stac_types.Item in Line 680

https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/blob/c8a07aed3a8734e65861f69ee946cee272061fbe/stac_fastapi/core/stac_fastapi/core/core.py#L643-L680

when it should be returning ItemSerializer.db_to_stac(item, base_url)

just like it does in create_collection/update_collection and update_item (I was wrong update_item actually returns the proper item with links)

https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/blob/c8a07aed3a8734e65861f69ee946cee272061fbe/stac_fastapi/core/stac_fastapi/core/core.py#L683-L709

Updating stac-fastapi.core.TransactionsClient.create_item with:

            item = await self.database.prep_create_item(item=item, base_url=base_url)
            await self.database.create_item(item, refresh=kwargs.get("refresh", False))
            # return item
            return ItemSerializer.db_to_stac(item, base_url)

does seem to fix the issue, but I'm not sure if it creates bugs elsewhere... even though tests seem to pass.

jonhealy1 commented 2 months ago

Definitely makes sense I think