microsoft / PlanetaryComputer

Issues, discussions, and information about the Microsoft Planetary Computer
https://planetarycomputer.microsoft.com/
MIT License
180 stars 7 forks source link

Broken tutorial: Radiant MLHub Land Cover #147

Open m-cappi opened 1 year ago

m-cappi commented 1 year ago

Hi!

I wanted to share that while working with the Radiant MLHub Land Cover tutorial I've found an issue with the chosen dataset.

On cell 5 I get an APIError: {"detail":"Collection ref_landcovernet_v1_labels do not exist."} when querying for "ref_landcovernet_v1_labels". Upon inspecting the Radiant MLHub datasets I've found that there's no longer a general purpose LandCoverNet dataset available.

There's an easy fix by replacing the collection_id with ref_landcovernet_??_v1_labels, filling in ?? with either sa for South America, af for Africa, as for Asia, eu for Europe or au for Australia.

---------------------------------------------------------------------------
APIError                                  Traceback (most recent call last)
Cell In [5], line 4
      1 collection_id = "ref_landcovernet_v1_labels"
      2 #collection_id = "ref_landcovernet_sa_v1_labels"
----> 4 collection = client.get_collection(collection_id)
      5 collection_sci_ext = ScientificExtension.ext(collection)
      6 print(f"Description: {collection.description}")

File /srv/conda/envs/notebook/lib/python3.10/site-packages/pystac_client/client.py:232, in Client.get_collection(self, collection_id)
    229 if self._supports_collections() and self._stac_io:
    230     url = f"{self.get_self_href()}/collections/{collection_id}"
    231     collection = CollectionClient.from_dict(
--> 232         self._stac_io.read_json(url),
    233         root=self,
    234         modifier=self.modifier,
    235     )
    236     call_modifier(self.modifier, collection)
    237     return collection

File /srv/conda/envs/notebook/lib/python3.10/site-packages/pystac/stac_io.py:198, in StacIO.read_json(self, source, *args, **kwargs)
    181 def read_json(self, source: HREF, *args: Any, **kwargs: Any) -> Dict[str, Any]:
    182     """Read a dict from the given source.
    183
    184     See :func:`StacIO.read_text <pystac.StacIO.read_text>` for usage of
   (...)
    196         given source.
    197     """
--> 198     txt = self.read_text(source, *args, **kwargs)
    199     return self.json_loads(txt)

File /srv/conda/envs/notebook/lib/python3.10/site-packages/pystac_client/stac_api_io.py:97, in StacApiIO.read_text(self, source, *args, **kwargs)
     95 href = str(source)
     96 if bool(urlparse(href).scheme):
---> 97     return self.request(href, *args, **kwargs)
     98 else:
     99     with open(href) as f:

File /srv/conda/envs/notebook/lib/python3.10/site-packages/pystac_client/stac_api_io.py:144, in StacApiIO.request(self, href, method, headers, parameters)
    142     raise APIError(str(err))
    143 if resp.status_code != 200:
--> 144     raise APIError.from_response(resp)
    145 try:
    146     return resp.content.decode("utf-8")

APIError: {"detail":"Collection ref_landcovernet_v1_labels does not exist."}

But applying this workaround arises a new issue: on cell 6, LabelExtension validates internally the STAC schema received from the API. But the newly proposed collection_id's schema does not match the one on database.

---------------------------------------------------------------------------
ExtensionNotImplemented                   Traceback (most recent call last)
Cell In [18], line 5
      3 first_item = next(item_search.get_items())
      4 print(first_item)
----> 5 first_item_label_ext = LabelExtension.ext(first_item)
      7 label_classes = first_item_label_ext.label_classes
      8 for label_class in label_classes:

File /srv/conda/envs/notebook/lib/python3.10/site-packages/pystac/extensions/label.py:701, in LabelExtension.ext(cls, obj, add_if_missing)
    695 """Extends the given STAC Object with properties from the :stac-ext:`Label
    696 Extension <label>`.
    697
    698 This extension can be applied to instances of :class:`~pystac.Item`.
    699 """
    700 if isinstance(obj, pystac.Item):
--> 701     cls.validate_has_extension(obj, add_if_missing)
    702     return cls(obj)
    703 else:

File /srv/conda/envs/notebook/lib/python3.10/site-packages/pystac/extensions/base.py:176, in ExtensionManagementMixin.validate_has_extension(cls, obj, add_if_missing)
    173     cls.add_to(obj)
    175 if cls.get_schema_uri() not in obj.stac_extensions:
--> 176     raise pystac.ExtensionNotImplemented(
    177         f"Could not find extension schema URI {cls.get_schema_uri()} in object."
    178     )

ExtensionNotImplemented: Could not find extension schema URI https://stac-extensions.github.io/label/v1.0.1/schema.json in object.
TomAugspurger commented 1 year ago

I think this is the same as https://github.com/microsoft/PlanetaryComputerExamples/issues/182? I got stuck at https://github.com/microsoft/PlanetaryComputerExamples/issues/182#issuecomment-1180626816 when I looked into this last.

If you're able to figure that out it'd be great! We might be blocked by an issue upstream in Radiant Earth's STAC API.

m-cappi commented 1 year ago

Hi Tom!

First off, regarding the ExtensionNotImplemented from using LabelExtension, the problem is two-fold:

Provided that these collections and the pystac.extensions packages are out of our reach, I've found that the easiest way to supress this ExtensionNotImplemented error is to provide the *Extension.ext() with a add_if_missing=True flag.

All this has an undesirable effect that the possible land cover labels are being printed as integers now, and one has to reach to the LandCoverNet dataset documentation in order to find class definitions.

Classes for None
- 0
- 1
- 2
- 3
- 4
- 5
- 6
- 7

image


Regarding the https://github.com/microsoft/PlanetaryComputerExamples/issues/182#issuecomment-1180626816, the issue with your code snippet is that you are using a HEAD action and receiving an 405 - Method Not Allowed status code. If you were to use GET instead, it's currently succeeding on my end.


Applying all these changes, I've been able to successfully run the notebook for the ref_landcovernet_af_v1_labels collection (LandCoverNet Africa). If you'd like, I can submit a PR with these changes by tomorrow.

Regarding the South America and North America LandCoverNet collections, the notebook fails in cell 11 because we are not receiving the expected bands during cell 10. I'd have to further look into that as to know why.

Item ID: ref_landcovernet_sa_v1_source_sentinel_1_24MYT_29_20180106
Assets:
- Asset Key: VH
- Asset Key: VV

As for the Asia, Australia and Europe LandCoverNet collections, I'm getting a 404 when querying different images during cell 10.