stac-extensions / mlm

STAC Machine Learning Model (MLM) Extension to describe ML models, their training details, and inference runtime requirements.
https://stac-extensions.github.io/mlm/
Apache License 2.0
1 stars 0 forks source link

Don't OmitIfNone MLM fields by default #27

Open fmigneault opened 5 hours ago

fmigneault commented 5 hours ago

@rbavery cloned issue crim-ca/mlm-extension#27 on 2024-06-24:

:rocket: Feature Request

OmitIfNone is used throughout stac_model. If we set values to None when creating the metadata with python as we show in the examples, the fields with None (for example, resize_type) won't be included in the JSON.

I think we should change this so by default (or maybe the only way) all fields are not omitted. I think this would involve removing OmitIfNone in Annotated for all fields.

:sound: Motivation

In the README we indicate it is best practice to specify when fields are None in the JSON. This change would keep us consistent with what I think is a helpful recommendation.

Fields that accept the null value can be considered null when omitted entirely for parsing purposes. However, setting null explicitly when this information is known by the model provider can help users understand what is the expected behavior of the model. It is therefore recommended to provide null explicitly when applicable.

:satellite: Alternatives

Keep the behavior as is.

:paperclip: Additional context

For example I have models that don't need input resizing so resize_type=None. Specifying None makes it so that these values don't show up in JSON even when using ml_model_meta.model_dump(exclude_unset=False)

fmigneault commented 5 hours ago

@fmigneault commented on 2024-07-26:

I agree regarding the fields that allow explicit None/null, such as resize_type and the mlm:pretrained_source. Their null value can have a meaning, which is more explicit than omitting the field.

For the others properties, I think it is preferable to keep the behavior of None auto-removal. The reasoning is that pystac uses the strategy of setting properties to None when doing a "delete". It is also a common practice in Python to obj.attribute = None to "unset". Also, letting None explicitly set in the properties can cause some frustration for users, since the generated JSON would not auto-remove the null values, and this will result in schema validation errors for most of the fields that expect another type.

We will need to verify properly how removing OmitIfNone for cases like resize_type behaves when involving the pystac integration. Also, I have taken knowledge recently by a user that POST'ing the STAC Item with mlm:pretrained_source: null resulted in STAC-API completely dropping the field in the backend. Therefore, the "full fix" might not be only at the level of MLM extension.

fmigneault commented 5 hours ago

@rbavery commented on 2024-07-29:

From talking with folks at the latest community call it sounds like dropping null values is an implementation issue in the backend for STAC API or pystac.

Also they shared some docs on Parquet and how it can't roundtrip null values with meaning. https://stac-utils.github.io/stac-geoparquet/latest/drawbacks/

I think this could be an issue. Probably folks wouldn't use parquet to describe MLM metadata but it's becoming more fleshed out how to do so with stac-geoparquet for large collections. So I could see tools for interacting with STAC metadata in either json or geoparquet continuing to expect null has no meaning.

Maybe we should have an explicit definition of None/null with meaning for these fields. "None" string might be confusing, could "nonexistent" work?

fmigneault commented 5 hours ago

@fmigneault commented on 2024-08-20:

I think the parquet interpretation is acceptable. In our cases, an undefined mlm:pretrained_source or explicitly defined mlm:pretrained_source = null are intended to mean the same. The null definition only makes it very visually explicit when reading the JSON, which is useful for users that might not know that mlm:pretrained_source property exist, and would otherwise have to "somehow guess" that detail.

I strongly believe this is only an issue in the STAC API backend when FastAPI does its conversion with pydantic. They would essentially need to do something similar to the OmitIfNone/Annotated workaround that was used for the same reason in STAC MLM.