Closed gadomski closed 1 year ago
Per the table extension and MS preference, the types listed in table:columns should be parquet types, not pandas types.
Ah, thanks. Do you know of a function to map from one to the other?
How widely accepted is the "cloud-optimized" role?
There's been some discussion in the past (I think on gitter), my sense is that it's pretty unclear what "cloud-optimized" even means. I kept it there because I didn't have a compelling reason to remove and it's not causing any harm, but maybe it's confusing?
Do you know of a function to map from one to the other?
I don't think there is an exact match. The pandas object
type is ambiguous. It could be a string, or a categorical, maybe other things. If all your pandas object
types are strings, though, you could set up a mapping. Or read the parquet back in and extract the types from the parquet schema at that point. I ended up using Tom's stac-table repo, which pulls types from the parquet schema, to generate the initial STAC Item for NCN.
but maybe it's confusing?
I don't find it confusing. I just am not super clear on the use of roles in the wild, so I default to less than more. If it is found undesirable, it can be removed after Item creation. Just wondering what your thoughts were on it.
a) my understanding is correct
Yup, I think so.
b) should we be adding proj:bbox going forward?
I don't think its necessary, since (as you said) you can always derive it. I kept it just because it was there before and its not doing much harm.
Related Issue(s):
Description: A major refactor, simplifying the geoparquet creation via geopandas. Output item and collection structure doesn't change too much.
Also computes the Item's geometry by union-ing all of the shapefiles in the zipfiles.
I wouldn't be surprised if there's some edge cases that I've missed, but I'm going to rely on real-world exercising on the Planetary Computer test environment to shake those out. I'll feed those back in future PRs.
PR checklist:
scripts/format
).scripts/lint
).scripts/test
).