Open fmigneault opened 5 hours ago
@fmigneault commented on 2024-07-26:
I agree regarding the fields that allow explicit
None
/null
, such asresize_type
and themlm:pretrained_source
. Theirnull
value can have a meaning, which is more explicit than omitting the field.For the others properties, I think it is preferable to keep the behavior of
None
auto-removal. The reasoning is thatpystac
uses the strategy of setting properties toNone
when doing a "delete". It is also a common practice in Python toobj.attribute = None
to "unset". Also, lettingNone
explicitly set in the properties can cause some frustration for users, since the generated JSON would not auto-remove thenull
values, and this will result in schema validation errors for most of the fields that expect another type.We will need to verify properly how removing
OmitIfNone
for cases likeresize_type
behaves when involving thepystac
integration. Also, I have taken knowledge recently by a user that POST'ing the STAC Item withmlm:pretrained_source: null
resulted in STAC-API completely dropping the field in the backend. Therefore, the "full fix" might not be only at the level of MLM extension.
@rbavery commented on 2024-07-29:
From talking with folks at the latest community call it sounds like dropping null values is an implementation issue in the backend for STAC API or pystac.
Also they shared some docs on Parquet and how it can't roundtrip null values with meaning. https://stac-utils.github.io/stac-geoparquet/latest/drawbacks/
I think this could be an issue. Probably folks wouldn't use parquet to describe MLM metadata but it's becoming more fleshed out how to do so with stac-geoparquet for large collections. So I could see tools for interacting with STAC metadata in either json or geoparquet continuing to expect null has no meaning.
Maybe we should have an explicit definition of None/null with meaning for these fields. "None" string might be confusing, could "nonexistent" work?
@fmigneault commented on 2024-08-20:
I think the parquet interpretation is acceptable. In our cases, an undefined
mlm:pretrained_source
or explicitly definedmlm:pretrained_source = null
are intended to mean the same. Thenull
definition only makes it very visually explicit when reading the JSON, which is useful for users that might not know thatmlm:pretrained_source
property exist, and would otherwise have to "somehow guess" that detail.I strongly believe this is only an issue in the STAC API backend when FastAPI does its conversion with
pydantic
. They would essentially need to do something similar to theOmitIfNone
/Annotated
workaround that was used for the same reason in STAC MLM.
@rbavery cloned issue crim-ca/mlm-extension#27 on 2024-06-24: