xaviernogueira / geoparquet-pydantic

Validate and convert GeoJSON<>GeoParquet...without GDAL!
MIT License
32 stars 3 forks source link

Add feature to support (or contribute to) STAC GeoParquet #8

Open xaviernogueira opened 2 weeks ago

xaviernogueira commented 2 weeks ago

See write up here: https://cloudnativegeo.org/blog/2024/08/introduction-to-stac-geoparquet Spec here: https://github.com/stac-utils/stac-geoparquet

Inspired by the broader, but GDAL dependent, work above I would like to add the stac-pydantic](https://github.com/stac-utils/stac-pydantic) models as a valid input <> STAC specific subclass of GeoParquetMetadata (if necessary).

In addition (as there is value to a non-GDAL dependent workflow), I was wondering what the maintainers of stac-geoparquet think about allowing stac-pydantic models to be passed into functions (as opposed to just dict objects currently) thoughts @kylebarron, @cholmes, @TomAugspurger?

In my view, the projects can exist together, as the whole point of this one is a super lightweight pydantic focused tool. Broadly speaking my vision is being able to "live in a validated world" where one has no need for leaving validated pydantic models.

kylebarron commented 2 weeks ago

Inspired by the broader, but GDAL dependent, work above

GDAL is not a necessary dependency of the above stac-geoparquet implementation. GeoPandas is included as a dependency because it's the most common library that people historically have used for this, but we don't use any GDAL-related functionality, and it would easily be able to be removed.

allowing stac-pydantic models to be passed into functions

The main question is: with your pydantic models, do you know a static schema for the entire collection? I doubt you do, because I assume every Item is validated independently for pydantic. Assuming this, there is not really any benefit to integrating with pydantic, because we have to do our own columnar schema inference anyways.

xaviernogueira commented 2 weeks ago

@kylebarron good points.

On my side I'll probably just quickly support the lightweight conversion bc it's in the scope of this project, and I'll leave it at that. I stand by my pydantic + pyarrow only vision, even if it's just me being wierd about things lol.

My only counterpoint on the stac-pydantic topic is that since you already suppor a Union[pystac.Item, dict[str, Any], it would just be another elif clause here for the arrow conversion (as an example).

That said, going a step further one could say you are already treating the dicts like objects anyways, with expected keys (where missing would throw KeyError), and a variety of derived properties like self_href that are intrinsically linked with the schema be attached to the model class.

I see many opportunities to consolidate a lot of logic scattered around here in STAC pydantic models...but that's just food for thought from my bias. I may throw up a PR more as a conversation starter.