microsoft / PlanetaryComputer

Issues, discussions, and information about the Microsoft Planetary Computer
https://planetarycomputer.microsoft.com/
MIT License
180 stars 7 forks source link

Access to Sentinel-2 metadata parquets #361

Open PlekhanovaElena opened 3 months ago

PlekhanovaElena commented 3 months ago

Hi there,

I was using MPC Hub, and now switching to using API. I'm working with Sentinel-2 metadata parquet, which is very useful in per-filtering the tiles which I want to request later through the MPC catalog. Now I wonder if there is any way to access the metadata from my computer.

On MPC Hub, I had the following code for accessing the Sentinel-2 metadata parquet:

import planetary_computer
import itertools
import dask.dataframe as dd

cc = planetary_computer.get_container_client("pcstacitems", "items")

blobs = list(cc.list_blobs("sentinel-2-l2a.parquet/"))

def key(blob):
    return blob.name.split("/")[1].split("_")[0]

keep_blobs = []
for k, v in itertools.groupby(sorted(blobs, key=key), key=key):
    v = list(v)
    blob = max(v, key=lambda x: x.last_modified)
    keep_blobs.append(blob)

uris = [f"az://items/{blob.name}" for blob in keep_blobs]

and then

df = dd.read_parquet(uris, storage_options={"account_name": "pcstacitems", \
                                            "credential": planetary_computer.sas.get_token("pcstacitems", "items").token})
df.head()

This worked while I used MPC Hub, but I'm not sure how to reproduce it through API access. Could you let me know if there is any script modification I can make here? Or is there any other ways to access Sentinel-2 metadata?

Thank you, Kind regards, Elena

TomAugspurger commented 3 months ago

The same code you used as before should continue to work.

The primary difference when accessing it through our local computer is that it'll be slower / higher latency, because it's not in the same Azure data center as the machines storing the data. If that becomes a bottle neck, you might want to set up compute near the data.