tum-esm / em27-retrieval-pipeline

automated EM27 data processing
https://em27-retrieval-pipeline.netlify.app
GNU General Public License v3.0
6 stars 1 forks source link

Parsing pressure files fails with v1.4.0 #107

Closed cfleur closed 1 month ago

cfleur commented 1 month ago

Current behaviour: Switching from v1.3.2 to 1.4.0 pressure files parsing produce following warning and processing in interrupted:

20241004 10:18:00 - DEBUG - Found 119 files in total and 1 matching files
20241004 10:18:00 - DEBUG - Parsing file datalogger-wetland-20240802.csv
20241004 08:58:27 - WARNING - Inputs incomplete: float() argument must be a string or a real number, not 'NoneType'

Could be coming from polars library. Checked the csv read in with polars library separately, cannot reproduce the warning.

Expected behaviour: Switching from v1.3.x to v1.4.x after updating the config schema and using the same raw input data does not cause an error in running retrievals. Able to specify pressure metadata in v1.4.0 to the same files used in previous versions.

Steps to reproduce:

  1. Run cli.py retrieval start v1.3.2 using same pressure file and see that it succeeds.
  2. Update pipeline to v.1.4.0.
  3. Update config file to match v1.4.0 schema:
{
  "version": "1.4",
  "general": {
    "data": {
      "ground_pressure": {
        "path": <path-to-pressure-folder>,
        "file_regex": "^datalogger-[a-z]+-$(YYYY)$(MM)$(DD).csv$",
        "separator": ",",
        "datetime_column": null,
        "datetime_column_format": null,
        "date_column": "UTCdate_____",
        "date_column_format": "%d.%m.%Y",
        "time_column": "UTCtime___",
        "time_column_format": "%H:%M:%S",
        "unix_timestamp_column": null,
        "unix_timestamp_column_format": null,
        "pressure_column": "BaroYoung",
        "pressure_column_format": "hPa"
      },
      "atmospheric_profiles": <path>,
      "interferograms": <path>,
      "results": <path>
    },
...
  "bundles" : [...]
  1. Run cli.py retrieval start v1.3.2 using same pressure file
  2. Find warning message in container logs
dostuffthatmatters commented 1 month ago

Hi @cfleur

thanks for the report!

We don't have NaN or None values in our pressure files or any tested pressured files. That's why I never saw this before.

I will fix this asap!

Best, Moritz

dostuffthatmatters commented 1 month ago

Could reproduce the exact message. Is fixed now. This code will drop all rows with null values in any of the required columns.

https://github.com/tum-esm/em27-retrieval-pipeline/blob/main/src/retrieval/utils/pressure_loading.py#L74-L86

There is a a lot of more testing on the pressure loading in v1.4.1.

Best, Moritz