Open christine-e-smit opened 4 years ago
Yes! We should totally be able to do this. We need to map the stac type to the intake-parquet driver. Here's where that would go:
Are you up for adding this feature?
I think I can handle adding one line to your drivers :)
But I'd think this would also require adding ingest-parquet as a dependency somewhere. Your top level requirements.txt, I assume?
And I'd need to add something to https://github.com/intake/intake-stac/blob/d71b2d2b0ea2f8c89cb0310706c4de6d19406e17/intake_stac/tests/test_catalog.py
Was looking over this during STAC sprint 6, currently updating types based on STAC Types
application/parquet
@wildintellect - if you are up for it, let's just do one PR where we update all the media types. I think application/parquet
makes sense. I can help provide additional mappings to intake drivers as needed.
As I said in #48, I was recently involved in group trying to use intake-stac with some data we have sitting in s3. This data is in parquet format. I've used intake-parquet on this data with no problem to get a dask data frame. But when I try with intake-stac,
I get the error:
I assume that intake-stac is keying off the "type" field in the item field. Parquet doesn't have a mime-type, so I tried 'parquet' without success. I then re-read your Readme and realized that if intake-stac is built on top of intake-xarray, then you probably can't read in parquet regardless of what I put in the "type" field.
Would it be possible to add parquet via the intake-parquet library?
I'm wondering if parquet is beyond the scope of the STAC catalog spec? I don't see parquet in STAC's list of media types here. But then I don't see zarr either and I'm guessing that you support zarr with intake-stac because it's your favored data type for pangeo.