zmoon / uscrn

Easily load U.S. CRN data
https://uscrn.readthedocs.io
MIT License
1 stars 0 forks source link
climate-data ncei noaa-data

uscrn

Easily load U.S. Climate Reference Network (USCRN) data.

Version on PyPI CI status Documentation status Test coverage pre-commit.ci status Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.

With uscrn, fetching and loading years of data for all USCRN sites[^a] takes just one line of code[^b].

Example:

import uscrn

df = uscrn.get_data(2019, "hourly", n_jobs=6)  # pandas.DataFrame

ds = uscrn.to_xarray(df)  # xarray.Dataset, with soil depth dimension if applicable (hourly, daily)

Both df (pandas) and ds (xarray) include dataset and variable metadata. For df, these are in df.attrs and can be preserved by writing to Parquet with the PyArrow engine[^d] with pandas v2.1+.

df.to_parquet("uscrn_2019_hourly.parquet", engine="pyarrow")

Conda install example[^c]:

conda create -n crn -c conda-forge python=3.10 joblib numpy pandas pyyaml requests xarray pyarrow netcdf4
conda activate crn
pip install --no-deps uscrn

[^a]: Use uscrn.load_meta() to load the site metadata table.

[^b]: Not counting the import statement...

[^c]: uscrn is not yet on conda-forge.

[^d]: Or the fastparquet engine with fastparquet v2024.2.0+.