microsoft / PlanetaryComputer

Issues, discussions, and information about the Microsoft Planetary Computer
https://planetarycomputer.microsoft.com/
MIT License
185 stars 9 forks source link

[S2-L2A] Inconsistent resulting array #270

Closed Ahleroy closed 1 year ago

Ahleroy commented 1 year ago

I am trying to create a temporal series of a unique area of interest. The following code extracts the visual image once for every single month since 2017.

Yet, instead of having one single output dimensions, I have two different ones (one being a cropped version of the other).

Full image: -12 172786619658517_13 164401926179986_-12 086784113449978_13 222896609309908_2017-02-01

Cropped image: -12 172786619658517_13 164401926179986_-12 086784113449978_13 222896609309908_2017-01-12

The following code :

from pystac.extensions.eo import EOExtension as eo
import pystac_client
import planetary_computer
import time
import numpy as np
import rasterio
from rasterio import windows
from rasterio import features
from rasterio import warp
from PIL import Image
import calendar
import numpy as np

catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1",
    modifier=planetary_computer.sign_inplace,
)

bounds = [-12.172786619658517, 13.164401926179986,   -12.086784113449978,  13.222896609309908] #The scene

lat1, lon1, lat2, lon2 = bounds

area_of_interest = {
    "type": "Polygon",
    "coordinates": [
        [
            [lat1, lon1],
            [lat2,lon1],
            [lat2, lon2],
            [lat1, lon2],
            [lat1, lon1],
        ]
    ],
}

for year in range(2017, 2024):
    for month in range(1, 13):

        if year == 2023 and month > 9:
            break
        weekday, last_day = calendar.monthrange(year, month)

        if month < 10:
            month = "0" + str(month)

        date1 = f"{year}-{month}-01"
        date2 = f"{year}-{month}-{last_day}"

        time_of_interest = date1 + "/" + date2

        t0 = time.time()
        search = catalog.search(
            collections=["sentinel-2-l2a"],
            intersects=area_of_interest,
            datetime=time_of_interest,
        )

        t1 = time.time()
        print(f"Search took {t1-t0} seconds")

        items = search.item_collection()
        print(f"Returned {len(items)} Items")

        least_cloudy_item = min(items, key=lambda item: eo.ext(item).cloud_cover)

        selected_date = least_cloudy_item.datetime.date()
        filename = str(lat1) + "_" + str(lon1) + "_" + str(lat2) + "_" + str(lon2) + "_" + str(selected_date)

        asset_href_visual = least_cloudy_item.assets["visual"].href

        print("Download visual image")

        with rasterio.open(asset_href_visual) as ds:
            aoi_bounds = features.bounds(area_of_interest)
            warped_aoi_bounds = warp.transform_bounds("epsg:4326", ds.crs, *aoi_bounds)
            aoi_window = windows.from_bounds(transform=ds.transform, *warped_aoi_bounds)
            band_data_visual = ds.read(window=aoi_window)

        print("End download visual image")
        img = Image.fromarray(np.transpose(band_data_visual, axes=[1, 2, 0]))
        img.save(filename + ".png")

Any idea why we have inconsistent dimensions over time ?

TomAugspurger commented 1 year ago

Mmm I'm not immediately sure, but I don't think that ESA makes any promises that a given area on Earth will always be captured in the "same" way.

STAC will return any item intersecting with your area of interest. You might want to check that the geometry of returned item is what you're looking for.

Ahleroy commented 1 year ago

I solved the issue. The error was in the least_cloudy_item. This specific zone is in the intersection of two product areas (T28PHV and T28PGV in this specific case). The least cloudy item was switching from one to the other between two successive dates.

TomAugspurger commented 1 year ago

Glad to hear it! In case it's helpful, you can filter on the s2:mgrs_tile field in the call to search using the query parameter.