microsoft / PlanetaryComputer

Issues, discussions, and information about the Microsoft Planetary Computer
https://planetarycomputer.microsoft.com/
MIT License
180 stars 7 forks source link

Impossible to retrieve goes-r lst data due to faulty documentation #300

Open Berhinj opened 9 months ago

Berhinj commented 9 months ago

Microsoft planetary computer documentation about getting goes lst data is insufficient to get that data.

Let me explain, here is an example of a goes image url:

https://goeseuwest.blob.core.windows.net/noaa-goes16/ABI-L2-LSTC/2023/008/12/OR_ABI-L2-LSTC-M6_G16_s20230081201170_e20230081203543_c20230081205077.nc)

Notice that directory name is fully predictable while the image file path basename is not because it contains the processing timestamp which will vary of few milliseconds from a goes image to another. The way GCP and AWS deal with that is by providing bucket path instead of URL/URI, so the user can list image paths in specific folders. Another solution would be providing a data catalog, either a stac or a parquet file, like you guys are already doing with other GOES products.

Or am I missing something?

TomAugspurger commented 9 months ago

Are you using Python? If so, you can use adlfs which provides an fsspec-compatible API for Azure Blob Storage, or azure-storage-blob. Either of these will let you write the files.

import adlfs

fs = adlfs.AzureBlobFileSystem("goeseuwest")  # storage account name
fs.ls("noaa-goes16/ABI-L2-LSTC/2023/008/12/")  # container name + path in storage container

Another solution would be providing a data catalog, either a stac or a parquet file, like you guys are already doing with other GOES products.

We'll expand our STAC catalog to cover these at some point, but don't have an ETA for that.

Berhinj commented 9 months ago

@TomAugspurger looks like I was wrong, thanks a million, I keep seeing you helping people on plenty of repo, thanks a lot for making our life easier!

The main issue was that I was using fsspec wrong, was able to fix it thanks to your comment, if someone wants to try with raw fsspec, here it is

import fsspec
fs = fsspec.filesystem(protocol="abfs",
                       account_name="goeseuwest")
fs.ls("/noaa-goes16/ABI-L2-LSTC/2023/008/12/")
Berhinj commented 9 months ago

@TomAugspurger - aw something I missed which might explain part of my confusion, it doesn't for cogs though:

import adlfs

fs = adlfs.AzureBlobFileSystem("goeseuwest")  # storage account name
fs.ls("noaa-goes-cogs/goes-16/")  # container name + path in storage container

gets me

ClientAuthenticationError: Server failed to authenticate the request. Please refer to the information in the www-authenticate header.
RequestId:3f425daf-101e-0072-13b8-26178f000000
Time:2023-12-04T13:49:53.3594625Z
ErrorCode:NoAuthenticationInformation
Content: <?xml version="1.0" encoding="utf-8"?><Error><Code>NoAuthenticationInformation</Code><Message>Server failed to authenticate the request. Please refer to the information in the www-authenticate header.
RequestId:3f425daf-101e-0072-13b8-26178f000000
Time:2023-12-04T13:49:53.3594625Z</Message></Error>
TomAugspurger commented 9 months ago

For that, see https://planetarycomputer.microsoft.com/docs/concepts/sas/.

The raw NetCDF files from NOAA are in public storage containers. The COGs we build for some of the products are in private storage containers, and so require a token to get. As those docs explain, you can get a token anonymously. I'm not sure whether we have COGs for the LST product.


From: Jonas Berhin @.> Sent: Monday, December 4, 2023 7:50 AM To: microsoft/PlanetaryComputer @.> Cc: Mention @.>; Comment @.>; Subscribed @.***> Subject: Re: [microsoft/PlanetaryComputer] Impossible to retrieve goes-r lst data due to faulty documentation (Issue #300)

@TomAugspurgerhttps://github.com/TomAugspurger - aw something I missed which might explain part of my confusion, it doesn't for cogs though:

import adlfs

fs = adlfs.AzureBlobFileSystem("goeseuwest") # storage account name fs.ls("noaa-goes-cogs/goes-16/") # container name + path in storage container

gets me

ClientAuthenticationError: Server failed to authenticate the request. Please refer to the information in the www-authenticate header. RequestId:3f425daf-101e-0072-13b8-26178f000000 Time:2023-12-04T13:49:53.3594625Z ErrorCode:NoAuthenticationInformation Content: <?xml version="1.0" encoding="utf-8"?>NoAuthenticationInformationServer failed to authenticate the request. Please refer to the information in the www-authenticate header. RequestId:3f425daf-101e-0072-13b8-26178f000000 Time:2023-12-04T13:49:53.3594625Z

— Reply to this email directly, view it on GitHubhttps://github.com/microsoft/PlanetaryComputer/issues/300#issuecomment-1838682136 or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAKAOIQRYCOU4LDCIXQEOUTYHXIKFBFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLLDTOVRGUZLDORPXI6LQMWWES43TOVSUG33NNVSW45FGORXXA2LDOOJIFJDUPFYGLKTSMVYG643JORXXE6NFOZQWY5LFVEZTCNZUGAYTKMZUQKSHI6LQMWSWS43TOVS2K5TBNR2WLKRSGAZDGNRUGA2TSNFHORZGSZ3HMVZKMY3SMVQXIZI. You are receiving this email because you were mentioned.

Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

Berhinj commented 9 months ago

Sorry to continuously reopen the topic, I managed to access the storage and the cogs thanks to your indication

Goes lst documentation says:

"Data are available in Blob Storage in the West Europe Azure data center1, in both NetCDF and cloud-optimized GeoTIFF (COG) format."

I feel like this is very misleading as less than 1% of goes-16/17/18 (for either LSTM/LSTC/LSTF) is available in COG for 2018-2023. Shouldn't it be updated accordingly? Or do you know who can I contact for that topic?

Do you have any vision, if Microsoft Planetary Computer plans on making the rest of the cogs available?

As I feel like the documentation regarding accessing COGS was lacking here is reproducable example on how to search and access goes cogs data.

import fsspec
import requests
import rioxarray as rxr
import planetary_computer

# Replace with your actual collection ID or storage account details
storage_account = "goeseuwest"
container = "noaa-goes-cogs"

# Function to get a token (in this context based on storage_account/container and not a collection id)
response = requests.get(f"https://planetarycomputer.microsoft.com/api/sas/v1/token/{storage_account}/{container}")
storage_token = response.json().get("token")

fs = fsspec.filesystem(protocol="abfs",
                       account_name="goeseuwest",
                      sas_token=token_data_storage["token"])
# Looking for an image
fs.ls("noaa-goes-cogs/goes-16/ABI-L2-LSTC/2023")

# Once you found your image, read it
url = 'https://goeseuwest.blob.core.windows.net/noaa-goes-cogs/goes-16/ABI-L2-LSTM/2023/201/21/OR_ABI-L2-LSTM1-M6_G16_s20232012100288_e20232012100345_c20232012100543_LST.tif'
rxr.open_rasterio(planetary_computer.sign(url))
TomAugspurger commented 9 months ago

I'll need to check on whether we're actually generating COGs for LST data. GOES-CMIhttps://planetarycomputer.microsoft.com/dataset/goes-cmi is the product we've pushed the furthest, and the others will eventually be available in a similar fashion.


From: Jonas Berhin @.> Sent: Tuesday, December 5, 2023 4:06 AM To: microsoft/PlanetaryComputer @.> Cc: Comment @.>; Subscribed @.> Subject: Re: [microsoft/PlanetaryComputer] Impossible to retrieve goes-r lst data due to faulty documentation (Issue #300)

Sorry to continuously reopen the topic, I managed to access the storage and the cogs thanks to your indication

Goes lst documentationhttps://planetarycomputer.microsoft.com/dataset/storage/goes-lst says:

"Data are available in Blob Storage in the West Europe Azure data center1, in both NetCDFhttps://www.unidata.ucar.edu/software/netcdf/ and cloud-optimized GeoTIFFhttps://www.cogeo.org/ (COG) format."

I feel like this is very misleading as less than 1% of goes-16/17/18 (for either LSTM/LSTC/LSTF) is available in COG for 2018-2023. Shouldn't it be updated accordingly? Or do you know who can I contact for that topic?

Do you have any vision, if Microsoft Planetary Computer plans on making the rest of the cogs available?

As I feel like the documentation regarding accessing COGS was lacking here is reproducable example on how to search and access goes cogs data.

import fsspec import requests import rioxarray as rxr import planetary_computer

Replace with your actual collection ID or storage account details

storage_account = "goeseuwest" container = "noaa-goes-cogs"

Function to get a token (in this context based on storage_account/container and not a collection id)

response = requests.get(f"https://planetarycomputer.microsoft.com/api/sas/v1/token/{storage_account}/{container}") storage_token = response.json().get("token")

fs = fsspec.filesystem(protocol="abfs", account_name="goeseuwest", sas_token=token_data_storage["token"])

Looking for an image

fs.ls("noaa-goes-cogs/goes-16/ABI-L2-LSTC/2023")

Once you found your image, read it

url = 'https://goeseuwest.blob.core.windows.net/noaa-goes-cogs/goes-16/ABI-L2-LSTM/2023/201/21/OR_ABI-L2-LSTM1-M6_G16_s20232012100288_e20232012100345_c20232012100543_LST.tif' rxr.open_rasterio(planetary_computer.sign(url))

— Reply to this email directly, view it on GitHubhttps://github.com/microsoft/PlanetaryComputer/issues/300#issuecomment-1840430121 or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAKAOIVM7GZJUBHJ4OCUGW3YH3WZLBFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLLDTOVRGUZLDORPXI6LQMWWES43TOVSUG33NNVSW45FGORXXA2LDOOJIFJDUPFYGLKTSMVYG643JORXXE6NFOZQWY5LFVEZTCNZUGAYTKMZUQKSHI6LQMWSWS43TOVS2K5TBNR2WLKRSGAZDGNRUGA2TSNFHORZGSZ3HMVZKMY3SMVQXIZI. You are receiving this email because you commented on the thread.

Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

Berhinj commented 9 months ago

That'd be super valuable for us, thanks a lot

Berhinj commented 9 months ago

Any news on this?

TomAugspurger commented 9 months ago

I haven't had a chance to. If you want, you could try looking through the directories in the goeseuwest/noaa-goes-cogs container.


From: Jonas Berhin @.> Sent: Wednesday, December 13, 2023 3:31 AM To: microsoft/PlanetaryComputer @.> Cc: Comment @.>; Subscribed @.> Subject: Re: [microsoft/PlanetaryComputer] Impossible to retrieve goes-r lst data due to faulty documentation (Issue #300)

Any news on this?

— Reply to this email directly, view it on GitHubhttps://github.com/microsoft/PlanetaryComputer/issues/300#issuecomment-1853561891 or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAKAOIUSVSAQPSHGJ7KUCMTYJFYX7BFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLLDTOVRGUZLDORPXI6LQMWWES43TOVSUG33NNVSW45FGORXXA2LDOOJIFJDUPFYGLKTSMVYG643JORXXE6NFOZQWY5LFVEZTCNZUGAYTKMZUQKSHI6LQMWSWS43TOVS2K5TBNR2WLKRSGAZDGNRUGA2TSNFHORZGSZ3HMVZKMY3SMVQXIZI. You are receiving this email because you commented on the thread.

Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.