stac-utils / pystac

Python library for working with any SpatioTemporal Asset Catalog (STAC)
https://pystac.readthedocs.io
Other
344 stars 115 forks source link

Object serialization should not make network requests by default #960

Open gadomski opened 1 year ago

gadomski commented 1 year ago

As discussed in https://github.com/stac-utils/pystac/issues/958, Item.to_dict can make one or more network requests if an object's root is unresolved and transform_hrefs == True (the default). This can lead to a huge number of near-simultaneous network requests, e.g. in the case of large scale batch processing of Items read in from a data store (e.g. a geoparquet table).

The simplest fix would be to make transform_hrefs default to False for all of PySTAC. This would be a breaking change, since the default behavior would be changing in potentially unexpected ways. This issue is intended to track this potential change, with the goal of gathering feedback and alternatives before a future PySTAC v2.0.

ircwaves commented 3 weeks ago

The hits keep coming!

https://github.com/stac-utils/pystac/blob/main/pystac/link.py#L154-L156

NBD:

find . -type f -name '*.py' \
    | xargs -I{} sed -i~ -e 's/link\.href/link.get_href(transform_href=False)/g' {}