microsoft / PlanetaryComputerDataCatalog

Data catalog for the Microsoft Planetary Computer
https://planetarycomputer.microsoft.com
MIT License
35 stars 15 forks source link

Invalid/ Corrupt Landsat file? #444

Closed lukasValentin closed 1 year ago

lukasValentin commented 1 year ago

I'm not sure if this repository is actually the right place to place this issue, but here comes the description:

Description

We read Landsat data for a study region in central Africa. While this works fine in most cases, we get errors when attempting to open a Landsat8 OLI scene acquired in 2022. The error only occurs when trying to load B4.

How to reproduce

import rasterio as rio
import planetary_computer

# URL to dataset (we load band 4)
url = 'https://landsateuwest.blob.core.windows.net/landsat-c2/level-2/standard/oli-tirs/2022/173/059/LC08_L2SP_173059_20220706_20220722_02_T1/LC08_L2SP_173059_20220706_20220722_02_T1_SR_B4.TIF'

url_signed = planetary_computer.sign_url(url)

ds = rio.open(url_signed)

This gives the following error message

Traceback (most recent call last):
  File "rasterio/_base.pyx", line 310, in rasterio._base.DatasetBase.__init__
  File "rasterio/_base.pyx", line 221, in rasterio._base.open_dataset
  File "rasterio/_err.pyx", line 221, in rasterio._err.exc_wrap_pointer
rasterio._err.CPLE_OpenFailedError: '/vsicurl/https://landsateuwest.blob.core.windows.net/landsat-c2/level-2/standard/oli-tirs/2022/173/059/LC08_L2SP_173059_20220706_20220722_02_T1/LC08_L2SP_173059_20220706_20220722_02_T1_SR_B4.TIF?st=2023-07-13T09%3A03%3A45Z&se=2023-07-14T09%3A48%3A45Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-07-14T07%3A40%3A28Z&ske=2023-07-21T07%3A40%3A28Z&sks=b&skv=2021-06-08&sig=rpO6Ia5Y8JQvnxLCUdC8Bv%2BNjzTbEEbdnItAAizP/Lg%3D' not recognized as a supported file format.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/mnt/ides/Lukas/venvs/GeoPython/lib64/python3.11/site-packages/rasterio/env.py", line 451, in wrapper
    return f(*args, **kwds)
           ^^^^^^^^^^^^^^^^
  File "/mnt/ides/Lukas/venvs/GeoPython/lib64/python3.11/site-packages/rasterio/__init__.py", line 304, in open
    dataset = DatasetReader(path, driver=driver, sharing=sharing, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "rasterio/_base.pyx", line 312, in rasterio._base.DatasetBase.__init__
rasterio.errors.RasterioIOError: '/vsicurl/https://landsateuwest.blob.core.windows.net/landsat-c2/level-2/standard/oli-tirs/2022/173/059/LC08_L2SP_173059_20220706_20220722_02_T1/LC08_L2SP_173059_20220706_20220722_02_T1_SR_B4.TIF?st=2023-07-13T09%3A03%3A45Z&se=2023-07-14T09%3A48%3A45Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-07-14T07%3A40%3A28Z&ske=2023-07-21T07%3A40%3A28Z&sks=b&skv=2021-06-08&sig=rpO6Ia5Y8JQvnxLCUdC8Bv%2BNjzTbEEbdnItAAizP/Lg%3D' not recognized as a supported file format.

Expected behavior

When we do the same for, e.g., Band 3, (url = https://landsateuwest.blob.core.windows.net/landsat-c2/level-2/standard/oli-tirs/2022/173/059/LC08_L2SP_173059_20220706_20220722_02_T1/LC08_L2SP_173059_20220706_20220722_02_T1_SR_B3.TIF) the rio call works with out any problems and as expected:

src = rio.open(url_signed)  # now set to B3 instead of B4
src.meta

outputs

{'driver': 'GTiff', 'dtype': 'uint16', 'nodata': 0.0, 'width': 7591, 'height': 7741, 'count': 1, 'crs': CRS.from_epsg(32635), 'transform': Affine(30.0, 0.0, 716385.0,
       0.0, -30.0, 275715.0)}

as expected.

Any hint why the B4 dataset cannot be read?

TomAugspurger commented 1 year ago

Thanks for the report! https://github.com/microsoft/PlanetaryComputer/discussions/101 has some discussion on this topic, but the summary is that we uploaded some incorrect data from USGS.

We'll hopefully have this fixed soon-ish, but no timeline. USGS has reprocessed some scenes, so the exact item ID might change slightly.