Closed gadomski closed 2 months ago
The pure-rust implementation would also convert STAC json to/from Arrow and Parquet? I'd love to see it but I'd imagine that the schema resolution will be a lot of work in Rust. Some of the pyarrow apis make our life a lot easier, and it would be annoying to redo that in Rust. But if you wanted to work on it, I'm of course very supportive
Tight, alright I'll PoC something over at stac-rs and ping you when it's worth looking at. I've played a bit with https://docs.rs/arrow-json/51.0.0/arrow_json/reader/fn.infer_json_schema_from_iterator.html and I think it will mostly work, with some manual tweaks, but it's still early doors.
That won't work directly on the geometry field any time you have multiple geometry types in a single collection. E.g. if you have polygons most of the time but multi polygons over the antimeridian, it'll fail. We handle that by converting to WKB before inferring a schema from the JSON.
Gentle, dumb proof-of-concept in this branch: https://github.com/stac-utils/stac-rs/pull/256. The motivating example (for now) is to use STAC-GeoParquet + DuckDB as an API backend.
If you run into any issues, feel free to ask!
I was thinking about using the spec document in this repo, along with cribbing some of your code, to make a pure-Rust implementation over at https://github.com/stac-utils/stac-rs. I was curious if ya'll thought it would be more appropriate to build that implementation over here and just depend on stac-rs instead?
Presumably there could be Python bindings into the Rust impl as well, eventually...also guessing @kylebarron has thought about this already :-).