microsoft / PlanetaryComputer

Issues, discussions, and information about the Microsoft Planetary Computer
https://planetarycomputer.microsoft.com/
MIT License
176 stars 6 forks source link

Problem with Landsat image #203

Open tomconte opened 1 year ago

tomconte commented 1 year ago

Hi! Is there a problem with the Landsat asset below? When I try to download it locally, I do get a file with a .tiff extension, but it looks like it contains some HTML? (edit: removed SAS token)

https://landsateuwest.blob.core.windows.net/landsat-c2/level-2/standard/oli-tirs/2022/047/028/LC08_L2SP_047028_20221108_20221115_02_T1/LC08_L2SP_047028_20221108_20221115_02_T1_SR_B4.TIF

TomAugspurger commented 1 year ago

Unfortunately, the USGS servers returned some errors for older scenes, and the text of those errors were written to blob storage.

See https://github.com/microsoft/PlanetaryComputer/discussions/101 for some discussion. I'll post there once it's fully resolved.

tomconte commented 1 year ago

Thanks Tom! As far as I can tell, we have executed our aggregation process successfully several times in the past, latest on February 1st. Has the data changed since that date? Just trying to figure out why we haven't seen that error before.

scottyhq commented 1 year ago

Figured I'd add another example since I've also encountered missing data issues. My understanding of the discussion in #101 is that some scenes are fixed on the USGS side and others are not... Here is a scene that has a valid preview and STAC, but the assets are corrupted, so when you ultimately try to read assets you'll get RasterioIOError: not recognized as a supported file format :

  1. Preview works https://planetarycomputer.microsoft.com/api/data/v1/item/map?collection=landsat-c2-l2&item=LC08_L2SP_226085_20220319_02_T1
  2. But error reading assets: https://planetarycomputer.microsoft.com/api/data/v1/item/info?collection=landsat-c2-l2&item=LC08_L2SP_226085_20220319_02_T1
  3. Follow the STAC rabbit trail:
    1. PC STAC: https://planetarycomputer.microsoft.com/api/stac/v1/collections/landsat-c2-l2/items/LC08_L2SP_226085_20220319_02_T1
    2. VIA: https://landsatlook.usgs.gov/stac-server/collections/landsat-c2l2-sr/items/LC08_L2SP_226085_20220319_20220329_02_T1_SR
    3. USGS-STAC-Browser: https://landsatlook.usgs.gov/stac-browser/collection02/level-2/standard/oli-tirs/2022/226/085/LC08_L2SP_226085_20220319_20220329_02_T1
    4. Assets seem fine? https://landsatlook.usgs.gov/data/collection02/level-2/standard/oli-tirs/2022/226/085/LC08_L2SP_226085_20220319_20220329_02_T1/LC08_L2SP_226085_20220319_20220329_02_T1_SR_B5.TIF

I think this illustrates another example of where some sort of metadata and data versioning would be useful to also have in the STAC metadata (https://github.com/microsoft/PlanetaryComputer/discussions/124).

TomAugspurger commented 1 year ago

I think this illustrates another example of where some sort of metadata and data versioning would be useful to also have in the STAC metadata (https://github.com/microsoft/PlanetaryComputer/discussions/124).

Yeah, we've sketched out a design for implementing the version extension but it'll be a decent amount of work to implement.

In the meantime, we've started doing the basics (computing and checking hashes before uploading files) and just need to clean this up for the historical data.

Has the data changed since that date? Just trying to figure out why we haven't seen that error before.

The Blob Storage service does provide a last modified property, which shows this asset hasn't changed since November 2022.

In [1]: import azure.storage.blob

In [2]: import planetary_computer

In [3]: cc = planetary_computer.get_container_client("landsateuwest", "landsat-c2")

In [5]: dict(cc.get_blob_client("level-2/standard/oli-tirs/2022/047/028/LC08_L2SP_047028_20221108_20221115_02_T1/LC08_L2SP_047028_20221108_20221115_02_T1_SR_B4.TIF").get_blob_properties())
Out[5]:
{'name': 'level-2/standard/oli-tirs/2022/047/028/LC08_L2SP_047028_20221108_20221115_02_T1/LC08_L2SP_047028_20221108_20221115_02_T1_SR_B4.TIF',
 'container': 'landsat-c2',
 'snapshot': None,
 'version_id': None,
 'is_current_version': None,
 'blob_type': <BlobType.BlockBlob: 'BlockBlob'>,
 'metadata': {},
 'encrypted_metadata': None,
 'last_modified': datetime.datetime(2022, 11, 17, 8, 22, 30, tzinfo=datetime.timezone.utc),
 'etag': '"0x8DAC874DEA18EC9"',
 'size': 14407,
 'content_range': None,
 'append_blob_committed_block_count': None,
 'is_append_blob_sealed': None,
 'page_blob_sequence_number': None,
 'server_encrypted': True,
 'copy': {'id': None, 'source': None, 'status': None, 'progress': None, 'completion_time': None, 'status_description': None, 'incremental_copy': None, 'destination_snapshot': None},
 'content_settings': {'content_type': 'image/tiff', 'content_encoding': None, 'content_language': None, 'content_md5': bytearray(b'\x10\xb4\x15\xbfY\xe3\xdf\xe7-\xd7\x90w\x0c\xcd\x1e\x92'), 'content_disposition': None, 'cache_control': None},
 'lease': {'status': 'unlocked', 'state': 'available', 'duration': None},
 'blob_tier': 'Hot',
 'rehydrate_priority': None,
 'blob_tier_change_time': datetime.datetime(2022, 11, 17, 8, 22, 30, tzinfo=datetime.timezone.utc),
 'blob_tier_inferred': None,
 'deleted': False,
 'deleted_time': None,
 'remaining_retention_days': None,
 'creation_time': datetime.datetime(2022, 11, 17, 8, 22, 30, tzinfo=datetime.timezone.utc),
 'archive_status': None,
 'encryption_key_sha256': None,
 'encryption_scope': None,
 'request_server_encrypted': True,
 'object_replication_source_properties': [],
 'object_replication_destination_policy': None,
 'last_accessed_on': None,
 'tag_count': None,
 'tags': None,
 'immutability_policy': {'expiry_time': None, 'policy_mode': None},
 'has_legal_hold': None,
 'has_versions_only': None}