ruisdael-observatory / disdroDL

disdrometer data logger to NetCDF
GNU General Public License v3.0
0 stars 0 forks source link

assess the quality of generated NetCDFs (thies and parsivel) #2

Closed andrecastro0o closed 2 months ago

andrecastro0o commented 3 months ago

assess the quality of generated NetCDFs (thies and parsivel) since the CS2000 development process has ended.

write down any issues that might be present in NetCDF and evaluate fixes(now, or future release)

andrecastro0o commented 3 months ago

Data Files on PAR008

thies data

/data/disdroDL/thies/202406/

20240617_Green_Village-GV_THIES006.nc
20240619_Green_Village-GV_THIES006.nc
20240621_Green_Village-GV_THIES006.nc
20240623_Green_Village-GV_THIES006.nc

parsivel data

ls /data/disdroDL/202406/

20240601_Green_Village-GV_PAR008.nc
20240601_Green_Village-GV_PAR008.nc.md5
20240602_Green_Village-GV_PAR008.nc
20240602_Green_Village-GV_PAR008.nc.md5
20240603_Green_Village-GV_PAR008.nc
20240603_Green_Village-GV_PAR008.nc.md5
20240604_Green_Village-GV_PAR008.nc
20240604_Green_Village-GV_PAR008.nc.md5
20240617_Green_Village-GV_PAR008.nc
20240619_Green_Village-GV_PAR008.nc
20240621_Green_Village-GV_PAR008.nc

Services

disdrodlv3_THIES.service

/etc/systemd/system/disdrodlv3_THIES.service

[Unit]
Description=disdrodlv3 Thies
After=multi-user.target

[Service]
ExecStart=/usr/local/src/venv/python-logging-software/bin/python3  /usr/local/src/python-logging-software/main.py -c /usr/local/src/python-logging-software/configs_netcdf/config_006_GV_THIES.yml
ExecReload=/usr/local/src/venv/python-logging-software/bin/python3  /usr/local/src/python-logging-software/main.py -c /usr/local/src/python-logging-software/configs_netcdf/config_006_GV_THIES.yml
TimeoutStopSec=10
Restart=always
RestartSec=30

[Install]
WantedBy=default.target

status:

● disdrodlv3_THIES.service - disdrodlv3 Thies
     Loaded: loaded (/etc/systemd/system/disdrodlv3_THIES.service; enabled; vendor preset: en>
     Active: activating (auto-restart) (Result: exit-code) since Mon 2024-07-22 10:38:15 UTC;>
    Process: 936 ExecStart=/usr/local/src/venv/python-logging-software/bin/python3 /usr/local>
   Main PID: 936 (code=exited, status=203/EXEC)
        CPU: 5ms

Jul 22 10:38:45 parsivel008pi systemd[1]: disdrodlv3_THIES.service: Scheduled restart job, re>
Jul 22 10:38:45 parsivel008pi systemd[1]: Stopped disdrodlv3 Thies.
Jul 22 10:38:45 parsivel008pi systemd[1]: Started disdrodlv3 Thies.
Jul 22 10:38:45 parsivel008pi systemd[981]: disdrodlv3_THIES.service: Failed to locate execut>
Jul 22 10:38:45 parsivel008pi systemd[981]: disdrodlv3_THIES.service: Failed at step EXEC spa>
Jul 22 10:38:45 parsivel008pi systemd[1]: disdrodlv3_THIES.service: Main process exited, code>
Jul 22 10:38:45 parsivel008pi systemd[1]: disdrodlv3_THIES.service: Failed with result 'exit->
~
~

disdrodlv3_PARSIVEL.service

/etc/systemd/system/disdrodlv3_PARSIVEL.service

[Unit]
Description=disdrodlv3 Parsivel
After=multi-user.target

[Service]
ExecStart=/usr/local/src/venv/python-logging-software/bin/python3  /usr/local/src/python-logging-software/main.py -c /usr/local/src/python-logging-software/configs_netcdf/config_008_GV.yml
ExecReload=/usr/local/src/venv/python-logging-software/bin/python3  /usr/local/src/python-logging-software/main.py -c /usr/local/src/python-logging-software/configs_netcdf/config_008_GV.yml
TimeoutStopSec=10
Restart=always
RestartSec=30

[Install]
WantedBy=default.target

status:

● disdrodlv3_PARSIVEL.service - disdrodlv3 Parsivel
     Loaded: loaded (/etc/systemd/system/disdrodlv3_PARSIVEL.service; enabled; vendor preset:>
     Active: activating (auto-restart) (Result: exit-code) since Mon 2024-07-22 10:39:46 UTC;>
    Process: 987 ExecStart=/usr/local/src/venv/python-logging-software/bin/python3 /usr/local>
   Main PID: 987 (code=exited, status=203/EXEC)
        CPU: 3ms
andrecastro0o commented 3 months ago

2024.07.22 Parsive and Thies NetCDFs

@mschleiss The following files were captured during 2024.07.22 (less than 24h as I started the loggers around 2pm)

At first glance they look good. I will try to pass them through NCQC checks.

If you have feedback on the files, just add it to the issue

andrecastro0o commented 2 months ago

@mschleiss On 2024.08.03 there was a bit of rain. I am including here the NetCDFs for both the Parsivel and the Thies from the GV, which results from DisdroDL v0.3.0 (this repo)

When you have the time have a look at them and report here if you find any issues with the files. Cheers!

andrecastro0o commented 2 months ago

More days with rain:

mschleiss commented 2 months ago

I checked the data and metadata for 2024-08-14 and noticed 6 issues, in decreasing order of importance:

Issue 1: There are some backslashes in the metadata description of variable "weather_code_metar_4678_5min" that need to be removed:

string weather_code_metar_4678_5min\ (unitless)(time) ; string weather_code_metar_4678_5min\ (unitless):_FillValue = "" ; weather_code_metar_4678_5min\ (unitless):long_name = "5-minute METAR Tab.4678 code" ; weather_code_metar_4678_5min\ (unitless):standard_name = "weather_code_metar_4678_5min (unitless)" ; weather_code_metar_4678_5min\ (unitless):comment = "See Table 7: Code SYNOP/METAR in the documentation for possible values" ;

Issue 2: In the datetime time series, I noticed some strange characters "+00:00" at the end of each date/time. The purpose of these characters is unclear to me and I suggest to remove them.

datetime = "2024-08-14T00:00:00.717944+00:00", "2024-08-14T00:01:00.222662+00:00", "2024-08-14T00:02:00.702306+00:00", ..

Issue 3: Variable: total_volume_gross_of_class_1:long_name = "Total volume (gross) of class 1 (unitless)" is given as uniteless. However, volume should have a unit. Same comment for all the other total_volume_gross variables. Please check the unit of all these variables and include it in the metadata description.

Issue 4: Some variables could use data type int instead of float. This is a minor issue but could save some storage space during compression. For example all variables with names "number_of_particles_xxx" could be of type int. Also, the interior temperature variable could be int, as well as many of the control output variables for voltage, heating, power supply etc..

Issue 5: I noticed an inconsistency in the way the long_names are given. For some variables, the long names include the units. For example, long_name = "Interior temperature [°C]", while for others, it doesn't (e.g., long_name = "1-minute radar reflectivity"). Since there does not appear to be any clear rule for how to handle this, I suggest to provide the units in the long names whenever possible. This may be redundant given that there is also a unit attribute but could be useful if people use the long name as input to plot the data.

Issue 6: There is a small inconsistency in the way the comments for the weather codes are given. For variables "weather_code_synop_4680" and "weather_code_metar_4678", the comment field states: "Possible values in Thies documentation: Table 7: Code table SYNOP/METAR". I suggest to change this text to the default sentence used for all the other weather codes: "See Table 7: Code SYNOP/METAR in the documentation for possible values"

andrecastro0o commented 2 months ago

@mschleiss I will create separate issues for each of the first 4 items and will ask you to review the pull request

andrecastro0o commented 2 months ago

For the BS variables, with no relevant data, let's remove them from the config file so that they are not included in the NetCDF. But keep them in the DB. I will create an issue for that and do the changes

mschleiss commented 2 months ago

The variables to remove are:

andrecastro0o commented 2 months ago

I will tackle the changes requested by @mschleiss in separate issues and Pull Requests, merged on to to branch 0.3.2.

Once all changes have been merged on branch 0.3.2, that branch shall be merged to main and a new release created

andrecastro0o commented 2 months ago
andrecastro0o commented 2 months ago

@mschleiss after these changes were implemented, the resulting NetCDFs are coming in. Here is the link for yesterday NetCDF https://ruisdael.citg.tudelft.nl/thies/Thies006_Green_Village/2024/202408/20240825_Green_Village-GV_THIES006.nc

I fell confident to merge the changes from https://github.com/ruisdael-observatory/disdroDL/pull/25

mschleiss commented 2 months ago

Looks good to me. The only (minor) issue I noticed concerns the fact that some of the variables could be encoded as integers instead of floats.

Variables that could be encoded as an unsigned 32 bit integer instead of a float:

Variables that could be encoded as unsigned 16 bit integer instead of a float:

Variables that could be encoded as an unsigned 8-bit integer:

Variables that could be encoded as a signed 8-bit integer:

Not sure how much control you have over the type of integer. At least, we could use 32 bit integers for most of the variables above and 8 bit integers for the variables whose values range between 0-100.

andrecastro0o commented 2 months ago

@mschleiss concerning the previous comment, the issue with doing this is that if these variables' values, are not encoded as integers by the Thies telegram, we will need to get into the source code and perform conversions to each one of them. It will be a substantial effort, for not much gain. The Thies NetCDF files are relatively small, hence this does not seem to be a priority

I will need to look further into the values of each field, see next comment

andrecastro0o commented 2 months ago

Variables that could be encoded as an unsigned 32 bit integer instead of a float:

Variables that could be encoded as unsigned 16 bit integer instead of a float:

Variables that could be encoded as an unsigned 8-bit integer:

Variables that could be encoded as a signed 8-bit integer:

Alright @mschleiss you did your home work :) all of these vars are encoded as ints in the telegram. Hence tey shall be easy to store as ints in the NetCDF. I will give it a go

mschleiss commented 2 months ago

@andrecastro0o You scared me for a while. I thought that we were doing some rounding on floats during the logging. I understand that logging everything as a float is the easier and faster solution. But the general philosophy should be: if the variable is given as an integer in the telegram, it should be logged as an integer in the NetCDF as well. The type of integer is not super important and won't make a huge difference in terms of storage space. But hey, I grew up in a time where memory was expensive. If something only varies between 0-100, a 8-bit integer shall be used :-)

andrecastro0o commented 2 months ago

@andrecastro0o You scared me for a while. I thought that we were doing some rounding on floats during the logging. I understand that logging everything as a float is the easier and faster solution. But the general philosophy should be: if the variable is given as an integer in the telegram, it should be logged as an integer in the NetCDF as well. The type of integer is not super important and won't make a huge difference in terms of storage space. But hey, I grew up in a time where memory was expensive. If something only varies between 0-100, a 8-bit integer shall be used :-)

@mschleiss. Sorry for the scare. I agree with only, not so much in terms of storage saving, but in relation to consistency - the vars values shouldn't be stored as floats if in their source they are ints.