Open m-mohr opened 2 years ago
I experienced the same issue playing around with the qgis stac browser plugin. It only happens when the root catalog / landing page is served at the root, for example http://localhost/
but not if served at a path like http://localhost/somepath/
.
Isn't this solved by saving Collections with CatalogType.SELF_CONTAINED
? I don't know if this is a bug per se -- maybe more of a documentation/best practices issue?
E.g. we generate a lot of self contained Collections for stactools-packages examples:
That is likely the case, but if I load a Collection from disk and create an Item for it, it's a bit weird that I have to set something in the Collection to get a relative link in the item, right? Can I specify the catalog type when loading from disk? Otherwise, I assume that if I save the Collection after the Item, the Item is not updated accordingly, right?
I still have to verify, but I think that's what I did back then (i.e. load collection from disk -> create item -> save item -> not sure whether I saved the collection again, likely not). So maybe what's missing is a way to specify the catalog type when calling Collection.from_file()? @gadomski
Yup, you're right -- reading a catalog with relative hrefs products pystac objects with absolute self hrefs: https://github.com/stac-utils/pystac/pull/972. I've opened that draft PR to demonstrate the problem and will use it to make a fix.
Thanks. Just for reference, this is where it occurred to me: https://github.com/stactools-packages/noaa-mrms-qpe/blob/main/src/stactools/noaa_mrms_qpe/commands.py#L135
@m-mohr circling back to this one, I'm now unsure of what the issue is. I implemented your workflow, as described in https://github.com/stac-utils/pystac/issues/838#issuecomment-1410405921, here: https://github.com/stac-utils/pystac/pull/972/commits/247d27c5f91e1171c2d517e4ce00ab1a3aa26a93. That test doesn't fail -- the item is saved with relative hrefs. I must be misunderstanding your scenario -- can you help clarify? Thanks.
@gadomski I'm not 100% sure about the test (not sure what prebuilt_catalog_1.get_all_collections()
does), but it was essential for this to happen that the Collection is loaded from disk, not created on the fly. Is that the case in the tests? Also, I did not use add_item
as I did not had to add the item to the collection (stactools package where you create an item for a collection, but don't add it to it directly).
it was essential for this to happen that the Collection is loaded from disk, not created on the fly. Is that the case in the tests?
Yes, the prebuilt catalog is created by reading https://github.com/stac-utils/pystac/blob/ba073056e420a99bab0e5287687246fa4ab8f320/tests/data-files/catalogs/test-case-1/catalog.json.
I did not use add_item as I did not had to add the item to the collection
Hm, then how did you add the item
link to the collection?
I haven't added the item link to the collection to keep the collection clean as I assumed it would not be relevant if you ingest into MS PC and generate links dynamically anyway. See https://github.com/stactools-packages/noaa-mrms-qpe/blob/main/src/stactools/noaa_mrms_qpe/commands.py#L135 for details.
I think I've figured out the issue, thanks for your patience and your iterations. You're bumping into some complexities around how pystac resolves link hrefs. Specifically, link resolution is absolute by default, and only uses relative hrefs if the link's owner has a root and the root is a "relative" Catalog type: https://github.com/stac-utils/pystac/blob/8db9afb4fdabf49376e6a73e184c70942cdfc4b6/pystac/link.py#L169-L185
I'm not sure if this a bug per se; it might just a complex and relatively undocumented system. @m-mohr what would be your preferred resolution?
This test fails:
def test_collection_link_is_relative(item: Item, collection: Collection) -> None:
collection.set_self_href("http://pystac.test/collection.json")
item.set_self_href("http://pystac.test/item.json")
item.set_collection(collection)
link = item.get_single_link("collection")
assert link
assert link.href == "./collection.json" # link.href is "http://pystac.test/collection.json"
In order to get your desired behavior (relative collection href), you need to make a couple changes:
def test_collection_link_is_relative(item: Item, collection: Collection) -> None:
collection.set_self_href("http://pystac.test/collection.json")
collection.catalog_type = CatalogType.SELF_CONTAINED # ensure the collection is a "relative" root (self contained or relative published)
item.set_self_href("http://pystac.test/item.json")
item.set_collection(collection)
item.set_root(collection) # the item needs a root to enable relative hrefs
link = item.get_single_link("collection")
assert link
assert link.href == "./collection.json"
Moving this to the 1.8 milestone so it doesn't hold up 1.7.
While I see a lot of code that normalizes asset hrefs and make them relative whenever required, it seems I always end up with a local file path for the collection link. I think the collection link should be normalized as well and ideally be relative to the self link.