sentinel-hub / sentinelhub-py

Download and process satellite imagery in Python using Sentinel Hub services.
http://sentinelhub-py.readthedocs.io/en/latest/
MIT License
798 stars 243 forks source link

[HELP] NaN in satelie images #504

Closed melgor closed 7 months ago

melgor commented 7 months ago

I'm encountering problems with NaN pixels for many locations in the US. I designed a pipeline, that downloads many products per single tile, and then I check if this is a valid image -> if not download the next product. However, it is a waste of resources on both my and API's side. Is there any way to filter images by the number of NaN (like it can be done for cloud coverage)? Also, could you explain why some images have NaN?

My location with such a problem:

%matplotlib inline

import os
import datetime as dt

import matplotlib.pyplot as plt
import numpy as np
import pyogrio

from sentinelhub import CRS, BBox, DataCollection, SHConfig, TileSplitter
from sentinelhub import SentinelHubCatalog
from sentinelhub.geometry import Geometry

from sentinelhub import MimeType, SentinelHubDownloadClient, SentinelHubRequest, bbox_to_dimensions, filter_times

config = SHConfig()
catalog = SentinelHubCatalog(config=config)

caspian_sea_bbox = BBox((-93.00, 30.64,-91.84, 31.63), crs=CRS.WGS84)
time_interval = "2020-01-01", "2021-01-01"
search_iterator = catalog.search(
    DataCollection.SENTINEL2_L1C,
    bbox=my_bbox_crswgs84,
    time=time_interval,
    filter="eo:cloud_cover < 1",
    fields={"include": ["id", "properties.datetime", "properties.eo:cloud_cover"], "exclude": []},
)

results = list(search_iterator)
print("Total number of results:", len(results))

time_difference = dt.timedelta(hours=1)

all_timestamps = search_iterator.get_timestamps()
unique_acquisitions = filter_times(all_timestamps, time_difference)

evalscript_true_color = """
    //VERSION=3

    function setup() {
        return {
            input: [{
                bands: ["B02", "B03", "B04"]
            }],
            output: {
                bands: 3
            }
        };
    }

    function evaluatePixel(sample) {
        return [sample.B04, sample.B03, sample.B02];
    }
"""

process_requests = []

for timestamp in unique_acquisitions:
    request = SentinelHubRequest(
        evalscript=evalscript_true_color,
        input_data=[
            SentinelHubRequest.input_data(
                data_collection=DataCollection.SENTINEL2_L1C.define_from("s2l1c", service_url=config.sh_base_url),
                time_interval=(timestamp - time_difference, timestamp + time_difference)
            )
        ],
        responses=[SentinelHubRequest.output_response("default", MimeType.PNG)],
        bbox=my_bbox_crswgs84,
        size=bbox_to_dimensions(my_bbox_crswgs84, 100),
        config=config,
    )
    process_requests.append(request)

client = SentinelHubDownloadClient(config=config)

download_requests = [request.download_list[0] for request in process_requests]

data = client.download(download_requests)

ncols, nrows = 2, 25

fig, axis = plt.subplots(
    ncols=ncols, nrows=nrows, figsize=(15, 60), subplot_kw={"xticks": [], "yticks": [], "frame_on": False}
)

for idx, (image, timestamp) in enumerate(zip(data, unique_acquisitions)):
    ax = axis[idx // ncols][idx % ncols]
    ax.imshow(np.clip(image * 2.5 / 255, 0, 1))
    ax.set_title(timestamp.date().isoformat(), fontsize=10)

plt.tight_layout()
Screenshot 2023-12-01 at 10 37 51
batic commented 7 months ago

The black stripes are normal: you are looking at the border of the orbits (see https://sentinels.copernicus.eu/web/sentinel/missions/sentinel-2/satellite-description/orbit). You can obtain dataMask band (https://docs.sentinel-hub.com/api/latest/data/sentinel-2-l1c/#available-bands-and-data) with your request and then filter them.

melgor commented 7 months ago

Thanks for your help. So, to download satellite images with low cloud-coverage and low nb of NaN, the best would be run a request for only dataMask from different acquisition dates and analyze the data to select the best acquisition date for me. Am I right?

batic commented 7 months ago

You also get the geometry of the tile in the catalog api response, and you could simply calculate area to see how much of your tile is actually covered with data on that day.

The following code snippet might help:

from sentinelhub import (BBox, CRS, SentinelHubCatalog, DataCollection)
import geopandas as gpd
import matplotlib.pyplot as plt

catalog = SentinelHubCatalog()
caspian_sea_bbox = BBox((49.9604, 44.7176, 51.0481, 45.2324), crs=CRS.WGS84)
time_interval = "2020-12-10", "2020-12-15"

search_iterator = catalog.search(
    DataCollection.SENTINEL2_L2A,
    bbox=caspian_sea_bbox,
    time=time_interval,
    filter="eo:cloud_cover < 5"
)
results = gpd.GeoDataFrame.from_features(list(search_iterator))

fig, ax = plt.subplots()
results.plot(ax=ax)
gpd.GeoDataFrame(geometry=[caspian_sea_bbox.geometry], crs=results.crs).plot(ax=ax, facecolor='r', alpha=0.5)

image

And then you can intersect your AOI with results,

results['aoi_intersection'] = results.geometry.intersection(caspian_sea_bbox.geometry).area / caspian_sea_bbox.geometry.area

to get the following:

image

At this point you can filter out all that have AOI smaller than some ratio, and download the rest.

melgor commented 7 months ago

Thank you very much!