stactools-packages / noaa-hrrr

NOAA High-Resolution Rapid Refresh (HRRR) stactools package
Other
0 stars 1 forks source link

stactools-noaa-hrrr

PyPI GitHub Workflow Status (with event)

wind speed forecast from 2024-05-10T12:00:00Z for 2024-05-10T14:00:00Z

This package can be used to generate STAC metadata for the NOAA High Resolution Rapid Refresh (HRRR) atmospheric forecast dataset.

The data are uploaded to cloud storage in AWS, Azure, and Google so you can pick which cloud provider you want to use for the grib and index hrefs using the cloud_provider argument to the functions in stactools.noaa_hrrr.stac.

Background

The NOAA HRRR dataset is a continuously updated atmospheric forecast data product.

Data structure

Summary of Considerations for Organizing STAC Metadata

After extensive discussions, we decided to organize the STAC metadata with the following structure:

  1. Collections: Separate collections for each region-product combination

    • regions: conus and alaska
    • products: sfc, prs, nat, and subh
  2. Items: Each GRIB file in the archive is represented as an item with two assets:

    • "grib": Contains the actual data.
    • "index": The .grib2.idx sidecar file.

    Each GRIB file contains the forecasts for all of a product's variables for a particular forecast hour from a reference time, so you need to combine data from multiple items to construct a time series for a forecast.

  3. grib:layers: Within each "grib" asset, a grib:layers property details each layer's information, including description, units, and byte ranges. This enables applications to access specific parts of the GRIB2 files without downloading the entire file.

    • We intend to propose a GRIB STAC extension with the grib:layers property for storing byte-ranges after testing this specification out on other GRIB2 datasets.
    • The layer-level metadata is worth storing in STAC because you can construct URIs for specific layers that GDAL can read using either /vsisubfile or vrt://:
      • /vsisubfile/{start_byte}_{byte_size},/vsicurl/{grib_href}
      • vrt:///vsicurl/{grib_href}?bands={grib_message}, where grib_message is the index of the layer within the GRIB2 file.
      • under the hood, GDAL's vrt driver is reading the sidecar .grib2.idx file and translating it into a /vsisubfile URI.

Advantages

Disadvantages

For more details, please refer to the related issue discussion and pull requests #3 and #6.

STAC examples

Python usage example

Installation

Install stactools-noaa-hrrr with pip:

pip install stactools-noaa-hrrr

Command-line usage

To create a collection object:

stac noaahrrr create-collection {region} {product} {cloud_provider} {destination_file}

e.g.

stac noaahrrr create-collection conus sfc azure example-collection.json

To create an item:

stac noaahrrr create-item \
  {region} \
  {product} \
  {cloud_provider} \
  {reference_datetime} \
  {forecast_hour} \
  {destination_file}

e.g.

stac noaahrrr create-item conus sfc azure 2024-05-01T12 10 example-item.json

To create all items for a date range:

stac noaahrrr create-item-collection \
  {region} \
  {product} \
  {cloud_provider} \
  {start_date} \
  {end_date} \
  {destination_folder}

e.g.

stac noaahrrr create-item-collection conus sfc azure 2024-05-01 2024-05-31 /tmp/items

Docker

You can launch a jupyterhub server in a docker container with all of the dependencies installed using these commands:

docker/build
docker/jupyter

Use stac noaahrrr --help to see all subcommands and options.

Contributing

We use pre-commit to check any changes. To set up your development environment:

pip install -e '.[dev]'
pre-commit install

To check all files:

pre-commit run --all-files

To run the tests:

pytest -vv

If you've updated the STAC metadata output, update the examples:

scripts/update-examples