microsoft / torchgeo

TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data
https://www.osgeo.org/projects/torchgeo/
MIT License
2.35k stars 300 forks source link

STAC API dataset #403

Open calebrob6 opened 2 years ago

calebrob6 commented 2 years ago

SpatioTemporal Asset Catalogs (STACs) are a way to organize geospatial datasets. STAC APIs let users query huge STAC Catalogs by date, time, and other metadata.

For example, the Microsoft Planetary Computer runs a STAC API that lets users search over catalogs containing all of Sentinel 2 imagery, all Landsat 8, etc. The following code uses the pystac_client library to query the Planetary Computer STAC API and returns metadata, and links to GeoTIFFs, for relevant Sentinel 2 scenes:

from pystac_client import Client

area_of_interest = {
    "type": "Polygon",
    "coordinates": [
        [
            [-148.56536865234375, 60.80072385643073],
            [-147.44338989257812, 60.80072385643073],
            [-147.44338989257812, 61.18363894915102],
            [-148.56536865234375, 61.18363894915102],
            [-148.56536865234375, 60.80072385643073],
        ]
    ],
}
time_of_interest = "2019-06-01/2019-08-01"

catalog = Client.open("https://planetarycomputer.microsoft.com/api/stac/v1")

search = catalog.search(
    collections=["sentinel-2-l2a"],
    intersects=area_of_interest,
    datetime=time_of_interest,
    query={"eo:cloud_cover": {"lt": 10}},
)

items = list(search.get_items())
print(f"Returned {len(items)} Items")

We'd like to build a STACAPIDataset object that essentially wraps catalog.search(...), creates a RasterDataset from the returned items, and otherwise behaves as a normal PyTorch dataset (signing assets as needed, etc.). A signature like STACAPIDataset(root="data/", api_endpoint, max_cache_size=None, **query_parameters_to_pystac_client) would be a good starting point here.

As a detailed note, it may be a good idea to cache accessed data in a local directory.

nilsleh commented 2 years ago

I would be really interested in taking on this task!

calebrob6 commented 2 years ago

All yours :) (I had you in mind writing this actually, it is a bit more interesting than the other dataset stuff!) -- feel free to message me if you want to discuss details

@adamjstewart -- this would involve taking on some dependencies (pystac_client, planetary-computer, maybe stackstac)

adamjstewart commented 2 years ago

We can make those deps optional if we need to.

metazool commented 1 year ago

Nice potential feature, is there still intention to work on it?