stac-utils / stac-geoparquet

Convert STAC items between JSON, GeoParquet, pgstac, and Delta Lake.
https://stac-utils.github.io/stac-geoparquet/
MIT License
78 stars 9 forks source link

Convert all ndararys to lists in `to_item_collection` #3

Closed TomAugspurger closed 5 months ago

TomAugspurger commented 1 year ago

This currently raises a ValueError:

import planetary_computer
import adlfs
import pystac

collection = pystac.read_file("https://planetarycomputer.microsoft.com/api/stac/v1/collections/aster-l1t")
asset = planetary_computer.sign(collection.assets["geoparquet-items"])

import dask_geopandas

ddf = dask_geopandas.read_parquet(asset.href, storage_options=asset.extra_fields["table:storage_options"])
df = ddf.head()

def fix(x):
    assets = {k: v for k, v in x.items() if v}
    return assets

df["assets"] = df.assets.apply(fix)

import stac_geoparquet
stac_geoparquet.stac_geoparquet.to_item_collection(df)

with

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In [106], line 20
     17 df["assets"] = df.assets.apply(fix)
     19 import stac_geoparquet
---> 20 stac_geoparquet.stac_geoparquet.to_item_collection(df)

File /srv/conda/envs/notebook/lib/python3.10/site-packages/stac_geoparquet/stac_geoparquet.py:119, in to_item_collection(df)
    114 for k in datelike:
    115     df2[k] = (
    116         df2[k].dt.strftime("%Y-%m-%dT%H:%M:%S.%fZ").fillna("").replace({"": None})
    117     )
--> 119 return pystac.ItemCollection(
    120     [to_dict(record) for record in df2.to_dict(orient="records")]
    121 )

File /srv/conda/envs/notebook/lib/python3.10/site-packages/pystac/item_collection.py:95, in ItemCollection.__init__(self, items, extra_fields, clone_items)
     92     else:
     93         return pystac.Item.from_dict(item_or_dict, preserve_dict=clone_items)
---> 95 self.items = list(map(map_item, items))
     96 self.extra_fields = extra_fields or {}

File /srv/conda/envs/notebook/lib/python3.10/site-packages/pystac/item_collection.py:93, in ItemCollection.__init__.<locals>.map_item(item_or_dict)
     91     return item_or_dict.clone() if clone_items else item_or_dict
     92 else:
---> 93     return pystac.Item.from_dict(item_or_dict, preserve_dict=clone_items)

File /srv/conda/envs/notebook/lib/python3.10/site-packages/pystac/item.py:419, in Item.from_dict(cls, d, href, root, migrate, preserve_dict)
    416 d.pop("type")
    417 d.pop("stac_version")
--> 419 item = cls(
    420     id=id,
    421     geometry=geometry,
    422     bbox=bbox,
    423     datetime=datetime,
    424     properties=properties,
    425     stac_extensions=stac_extensions,
    426     collection=collection_id,
    427     extra_fields=d,
    428     assets={k: Asset.from_dict(v) for k, v in assets.items()},
    429 )
    431 has_self_link = False
    432 for link in links:

File /srv/conda/envs/notebook/lib/python3.10/site-packages/pystac/item.py:113, in Item.__init__(self, id, geometry, bbox, datetime, properties, stac_extensions, href, collection, extra_fields, assets)
    100 def __init__(
    101     self,
    102     id: str,
   (...)
    111     assets: Optional[Dict[str, Asset]] = None,
    112 ):
--> 113     super().__init__(stac_extensions or [])
    115     self.id = id
    116     self.geometry = geometry

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

We should verify that all list-likes objects (including those nested within dicts) are lists and not ndarrays.

martindurant commented 6 months ago

What's the status here, is there a way to convert these geoparqet files to STAC collections (or each row to items) ?

TomAugspurger commented 6 months ago

to_item_collection is the function for that. Depending on how the data was written, you might need to convert some ndarrays to python lists.

martindurant commented 6 months ago

OK, Intake 2 now supports reading from these, including multi-banding; but I don't like the format :) Here is my recursive cleaning method.

TomAugspurger commented 5 months ago

This is mostly closed by #31

In [1]: import pystac_client, stac_geoparquet

In [2]: items = list(pystac_client.Client.open("https://planetarycomputer.microsoft.com/api/stac/v1").search(collections="aster-l1t", max_items=250).items_as_dicts())

In [3]: df = stac_geoparquet.stac_geoparquet.to_geodataframe(items, dtype_backend="pyarrow")

In [4]: type(stac_geoparquet.to_item_collection(df)[0].to_dict()['stac_extensions'])
Out[4]: list