synoptic / wmo-uasdc

Repository for code and examples related to the WMO UASDC
MIT License
3 stars 2 forks source link

Problems with uas2bufr #11

Closed nobodyinperson closed 1 week ago

nobodyinperson commented 3 weeks ago

I have big problems with the netcdf-to-bufr convertion.

Problem 1: The eccodes package

One problem is that eccodes is a notoriously fragile and difficult to install library (the dreaded RuntimeError: Cannot find the ecCodes library error...), with prebuilt binaries only for very specific python versions, architectures, etc. For now, I managed to build an apptainer based on docker://python:3.10 and a poetry project with the library versions you detail here:

pyproject.toml ```toml [tool.poetry] name = "uas2bufr" version = "0.1.0" description = "WMO UASDC NetCDF to bufr" authors = ["WMO "] [tool.poetry.dependencies] python = "^3.10" ecmwflibs = "0.5.3" numpy = "1.26.2" netcdf4 = "1.6.5" eccodes = "1.6.1" [tool.poetry.scripts] uas2bufr = "uas2bufr.uas2bufr:main" [build-system] requires = ["poetry-core"] build-backend = "poetry.core.masonry.api" ```

I also tried with poetry2nix, which is well suited for reproducibility, but even that fails with the dreaded RuntimeError: Cannot find the ecCodes library error.

default.nix ```nix { pkgs ? (import { }), poetry2nix ? (import { }), ... }: poetry2nix.mkPoetryApplication { projectDir = ./.; preferWheels = true; extras = [ ]; } ```

It would be really helpful if you provided a reproducible, platform-independent way of running the uas2bufr script.

Debugging via the __error.txt files landing in the S3 bucket following failure after upload is not viable, obviously.

A Docker image, an Apptainer image, a working nix environment, anything really that just allows running the uas2bufr script reliably.

Problem 2: Broken uas2bufr script

Now that reproducing the error from the generated __error.txt in the S3 bucket is possible, I could reproduce it locally, and added some debugging prints:

❯ apptainer exec ./python310.sif poetry run uas2bufr ../../2024/08/19/flik/flik-0091-vital1-Jülich/UASDC_027_flik_20240819093659Z.nc
INFO:    gocryptfs not found, will not be able to use gocryptfs
/home/yann/.cache/pypoetry/virtualenvs/uas2bufr-VUBRx9p0-py3.10/lib/python3.10/site-packages/gribapi/__init__.py:23: UserWarning: ecCodes 2.31.0 or higher is recommended. You are running version 2.30.0
  warnings.warn(
units_time = 'nanoseconds since 2024-08-19 09:36:59.200000048' (units of the netcdf time axis?)
mytime = ['nanoseconds', 'since', '2024-08-19', '09:36:59.200000048'] (words in the units of the netcdf time axis?)
mytime1 = ['2024', '08', '19'] (parts of the date, year month day)
mytime2 = ['19'] (why would splitting the day by colons - there are none - yield anything? Anything below fails.)
Traceback (most recent call last):
  File "/home/yann/code/vital/uas-data/scripts/uas2bufr/uas2bufr/uas2bufr.py", line 359, in main
    uas2bufr(nc_filename)
  File "/home/yann/code/vital/uas-data/scripts/uas2bufr/uas2bufr/uas2bufr.py", line 185, in uas2bufr
    uas2Dict_read = read_netcdf(nc_filename)
  File "/home/yann/code/vital/uas-data/scripts/uas2bufr/uas2bufr/uas2bufr.py", line 162, in read_netcdf
    smin = mytime2[1]
IndexError: list index out of range

Apparently, time unit parsing is implemented with hard-coded assumptions. In this case, the date 19 is tried to be split by colon : for some reason and assumed to contain the fractional seconds. Instead of doing fragile character splitting, I recommend using robust time libraries such as the built-in datetime or just pandas.to_datetime.

Our campaign ends this week so unless I find a solution to this, our data can't land in the S3 bucket right after the flights.

jmaxmarno commented 3 weeks ago

hi @nobodyinperson - looking into this, we'll get back to you as soon as possible

nobodyinperson commented 3 weeks ago

Hi Max, thank you. And while you're at it, the script also dies when the values for e.g latitude actually has a mask, which happens for some reason with xarray. In you example, the mask=False, but if it actually has a masc (boolean array, however xarray does that), uas2bufr fails, because it doesn't filter for non-masked values.

Am Mittwoch, 21. August 2024 schrieb Max Marno: hi @nobodyinperson - looking into this, we'll get back to you as soon as possible

-- Reply to this email directly or view it on GitHub: https://github.com/synoptic/wmo-uasdc/issues/11#issuecomment-2302625692 You are receiving this because you were mentioned.

Message ID: @.***

nobodyinperson commented 1 week ago

I finally managed to get uas2bufr to eat my netCDF file after lots of debugging. It would be really great if uas2bufr was more robust and would explain errors better. Besides my above points, for example just bubbling up the gribapi error OutOfRangeError: Value out of coding range without any indication of variable name or actual value makes it quite hard to find the actual problem.

But at least I have a working state I can build on now. 👍

jmaxmarno commented 1 week ago

@nobodyinperson you're right, it's very procedural code. I'm glad you've got a working state now, and for your reference, uas2bufr is chiefly based on this repo: https://github.com/marijanacrepulja/uas2bufr we can consider adding more verbose logging and calling out specific variables where applicable but at the moment this work isn't planned. PR's welcomed!

nobodyinperson commented 1 week ago

Understood. No need for this issue then anymore, I guess.