Open calebrob6 opened 2 years ago
I would be really interested in taking on this task!
All yours :) (I had you in mind writing this actually, it is a bit more interesting than the other dataset stuff!) -- feel free to message me if you want to discuss details
@adamjstewart -- this would involve taking on some dependencies (pystac_client, planetary-computer, maybe stackstac)
We can make those deps optional if we need to.
Nice potential feature, is there still intention to work on it?
SpatioTemporal Asset Catalogs (STACs) are a way to organize geospatial datasets. STAC APIs let users query huge STAC Catalogs by date, time, and other metadata.
For example, the Microsoft Planetary Computer runs a STAC API that lets users search over catalogs containing all of Sentinel 2 imagery, all Landsat 8, etc. The following code uses the pystac_client library to query the Planetary Computer STAC API and returns metadata, and links to GeoTIFFs, for relevant Sentinel 2 scenes:
We'd like to build a
STACAPIDataset
object that essentially wrapscatalog.search(...)
, creates a RasterDataset from the returned items, and otherwise behaves as a normal PyTorch dataset (signing assets as needed, etc.). A signature likeSTACAPIDataset(root="data/", api_endpoint, max_cache_size=None, **query_parameters_to_pystac_client)
would be a good starting point here.As a detailed note, it may be a good idea to cache accessed data in a local directory.