stac-utils / pystac

Python library for working with any SpatioTemporal Asset Catalog (STAC)
https://pystac.readthedocs.io
Other
358 stars 120 forks source link

Collection link points to local path #838

Open m-mohr opened 2 years ago

m-mohr commented 2 years ago

While I see a lot of code that normalizes asset hrefs and make them relative whenever required, it seems I always end up with a local file path for the collection link. I think the collection link should be normalized as well and ideally be relative to the self link.

b4l commented 2 years ago

I experienced the same issue playing around with the qgis stac browser plugin. It only happens when the root catalog / landing page is served at the root, for example http://localhost/ but not if served at a path like http://localhost/somepath/.

gadomski commented 1 year ago

Isn't this solved by saving Collections with CatalogType.SELF_CONTAINED? I don't know if this is a bug per se -- maybe more of a documentation/best practices issue?

E.g. we generate a lot of self contained Collections for stactools-packages examples:

m-mohr commented 1 year ago

That is likely the case, but if I load a Collection from disk and create an Item for it, it's a bit weird that I have to set something in the Collection to get a relative link in the item, right? Can I specify the catalog type when loading from disk? Otherwise, I assume that if I save the Collection after the Item, the Item is not updated accordingly, right?

I still have to verify, but I think that's what I did back then (i.e. load collection from disk -> create item -> save item -> not sure whether I saved the collection again, likely not). So maybe what's missing is a way to specify the catalog type when calling Collection.from_file()? @gadomski

gadomski commented 1 year ago

Yup, you're right -- reading a catalog with relative hrefs products pystac objects with absolute self hrefs: https://github.com/stac-utils/pystac/pull/972. I've opened that draft PR to demonstrate the problem and will use it to make a fix.

m-mohr commented 1 year ago

Thanks. Just for reference, this is where it occurred to me: https://github.com/stactools-packages/noaa-mrms-qpe/blob/main/src/stactools/noaa_mrms_qpe/commands.py#L135

gadomski commented 1 year ago

@m-mohr circling back to this one, I'm now unsure of what the issue is. I implemented your workflow, as described in https://github.com/stac-utils/pystac/issues/838#issuecomment-1410405921, here: https://github.com/stac-utils/pystac/pull/972/commits/247d27c5f91e1171c2d517e4ce00ab1a3aa26a93. That test doesn't fail -- the item is saved with relative hrefs. I must be misunderstanding your scenario -- can you help clarify? Thanks.

m-mohr commented 1 year ago

@gadomski I'm not 100% sure about the test (not sure what prebuilt_catalog_1.get_all_collections() does), but it was essential for this to happen that the Collection is loaded from disk, not created on the fly. Is that the case in the tests? Also, I did not use add_item as I did not had to add the item to the collection (stactools package where you create an item for a collection, but don't add it to it directly).

gadomski commented 1 year ago

it was essential for this to happen that the Collection is loaded from disk, not created on the fly. Is that the case in the tests?

Yes, the prebuilt catalog is created by reading https://github.com/stac-utils/pystac/blob/ba073056e420a99bab0e5287687246fa4ab8f320/tests/data-files/catalogs/test-case-1/catalog.json.

I did not use add_item as I did not had to add the item to the collection

Hm, then how did you add the item link to the collection?

m-mohr commented 1 year ago

I haven't added the item link to the collection to keep the collection clean as I assumed it would not be relevant if you ingest into MS PC and generate links dynamically anyway. See https://github.com/stactools-packages/noaa-mrms-qpe/blob/main/src/stactools/noaa_mrms_qpe/commands.py#L135 for details.

gadomski commented 1 year ago

I think I've figured out the issue, thanks for your patience and your iterations. You're bumping into some complexities around how pystac resolves link hrefs. Specifically, link resolution is absolute by default, and only uses relative hrefs if the link's owner has a root and the root is a "relative" Catalog type: https://github.com/stac-utils/pystac/blob/8db9afb4fdabf49376e6a73e184c70942cdfc4b6/pystac/link.py#L169-L185

I'm not sure if this a bug per se; it might just a complex and relatively undocumented system. @m-mohr what would be your preferred resolution?

Examples

This test fails:

def test_collection_link_is_relative(item: Item, collection: Collection) -> None:
    collection.set_self_href("http://pystac.test/collection.json")
    item.set_self_href("http://pystac.test/item.json")
    item.set_collection(collection)
    link = item.get_single_link("collection")
    assert link
    assert link.href == "./collection.json"  # link.href is "http://pystac.test/collection.json"

In order to get your desired behavior (relative collection href), you need to make a couple changes:

def test_collection_link_is_relative(item: Item, collection: Collection) -> None:
    collection.set_self_href("http://pystac.test/collection.json")
    collection.catalog_type = CatalogType.SELF_CONTAINED  # ensure the collection is a "relative" root (self contained or relative published)
    item.set_self_href("http://pystac.test/item.json")
    item.set_collection(collection)
    item.set_root(collection)  # the item needs a root to enable relative hrefs
    link = item.get_single_link("collection")
    assert link
    assert link.href == "./collection.json"
gadomski commented 1 year ago

Moving this to the 1.8 milestone so it doesn't hold up 1.7.