unity-sds / unity-data-services

Apache License 2.0
0 stars 3 forks source link

[Bug]: Staging in data from Unity results in non-compliant catalog.json #469

Open ngachung opened 1 week ago

ngachung commented 1 week ago

When using UDS v9.0.0 image, stage-in produces catalog.json that cannot be read by pystac when trying to stage data cataloged by DS.

Tried to read that following catalog.json with unitypy

{
    "type": "Catalog",
    "id": "NA",
    "stac_version": "1.0.0",
    "description": "NA",
    "links": [
        {
            "rel": "root",
            "href": "catalog.json",
            "type": "application/json"
        },
        {
            "rel": "item",
            "href": "urn:nasa:unity:unity:dev:SBG-L2A_RSRFL___1:SISTER_EMIT_L2A_RSRFL_20240103T131936_001_UNC.stac.json",
            "type": "application/json"
        },
        {
            "rel": "item",
            "href": "urn:nasa:unity:unity:dev:SBG-L2A_RSRFL___1:SISTER_EMIT_L2A_RSRFL_20240103T131936_001.stac.json",
            "type": "application/json"
        }
    ]
}

Results in the following stacktrace

Traceback (most recent call last):
  File "/Users/nchung/PycharmProjects/unity-py/unity_sds_client/resources/collection.py", line 196, in from_stac
    for item in items:
  File "/Users/nchung/anaconda3/envs/cwl/lib/python3.11/site-packages/pystac/catalog.py", line 502, in get_all_items
    yield from self.get_items()
  File "/Users/nchung/anaconda3/envs/cwl/lib/python3.11/site-packages/pystac/stac_object.py", line 369, in get_stac_objects
    link.resolve_stac_object(root=self.get_root())
  File "/Users/nchung/anaconda3/envs/cwl/lib/python3.11/site-packages/pystac/link.py", line 322, in resolve_stac_object
    obj = stac_io.read_stac_object(target_href, root=root)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nchung/anaconda3/envs/cwl/lib/python3.11/site-packages/pystac/stac_io.py", line 231, in read_stac_object
    d = self.read_json(source, *args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nchung/anaconda3/envs/cwl/lib/python3.11/site-packages/pystac/stac_io.py", line 202, in read_json
    txt = self.read_text(source, *args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nchung/anaconda3/envs/cwl/lib/python3.11/site-packages/pystac/stac_io.py", line 279, in read_text
    return self.read_text_from_href(href)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nchung/anaconda3/envs/cwl/lib/python3.11/site-packages/pystac/stac_io.py", line 296, in read_text_from_href
    with urlopen(req) as f:
         ^^^^^^^^^^^^
  File "/Users/nchung/anaconda3/envs/cwl/lib/python3.11/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nchung/anaconda3/envs/cwl/lib/python3.11/urllib/request.py", line 519, in open
    response = self._open(req, data)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nchung/anaconda3/envs/cwl/lib/python3.11/urllib/request.py", line 541, in _open
    return self._call_chain(self.handle_open, 'unknown',
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nchung/anaconda3/envs/cwl/lib/python3.11/urllib/request.py", line 496, in _call_chain
    result = func(*args)
             ^^^^^^^^^^^
  File "/Users/nchung/anaconda3/envs/cwl/lib/python3.11/urllib/request.py", line 1419, in unknown_open
    raise URLError('unknown url type: %s' % type)
urllib.error.URLError: <urlopen error unknown url type: urn>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/nchung/PycharmProjects/unity-py/unity_py/resources/test.py", line 8, in <module>
    collection = Collection.from_stac("/Users/nchung/PycharmProjects/unity-data-services/cwl/stage-in-unity/granules/catalog.json")
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nchung/PycharmProjects/unity-py/unity_sds_client/resources/collection.py", line 240, in from_stac
    raise UnityException("An unknown error occured creating collection from stac")
unity_sds_client.unity_exception.UnityException: An unknown error occured creating collection from stac
wphyojpl commented 1 day ago

I think this is wrong because it's using collection and trying to load catalog.

Collection.from_stac("/Users/nchung/PycharmProjects/unity-data-services/cwl/stage-in-unity/granules/catalog.json")

When using Catalog, it can load but validation fails with this error: pystac.errors.STACError: Relative path catalog.json encountered without owner "self" link set.

upload_result = Catalog.from_dict(sample)
upload_result.validate()
wphyojpl commented 1 day ago

But the cause is still the same.

{
    'type': 'Catalog', 'id': 'NA', 'stac_version': '1.0.0', 'description': 'NA',
    'links': [{'rel': 'root', 'href': 'catalog.json', 'type': 'application/json'}, {'rel': 'item', 'href': 'SNDR.SNPP.ATMS.L1A.nominal2.08.stac.json', 'type': 'application/json'}, {'rel': 'item', 'href': 'SNDR.SNPP.ATMS.L1A.nominal2.01.stac.json', 'type': 'application/json'}, {'rel': 'item', 'href': 'SNDR.SNPP.ATMS.L1A.nominal2.06.stac.json', 'type': 'application/json'}, {'rel': 'item', 'href': 'SNDR.SNPP.ATMS.L1A.nominal2.18.stac.json', 'type': 'application/json'}, {'rel': 'item', 'href': 'SNDR.SNPP.ATMS.L1A.nominal2.04.stac.json', 'type': 'application/json'}]}

This works, but the above example doesn't work.

wphyojpl commented 1 day ago

This is the solution. But it requires the reader to decode it.

import urllib.parse
urllib.parse.quote("urn:nasa:unity:unity:dev:SBG-L2A_RSRFL___1:SISTER_EMIT_L2A_RSRFL_20240103T131936_001.stac.json", safe="")
ngachung commented 13 hours ago

Or when we write the catalog.json and *.stac.json can we remove the urn prefix and just use, for example, SISTER_EMIT_L2A_RSRFL_20240103T131936_001_UNC.stac.json and SISTER_EMIT_L2A_RSRFL_20240103T131936_001.stac.json

wphyojpl commented 13 hours ago

I see. Since this is just a filename, it can be anything. That also works.