howweirdistheweather / weather_app

2 stars 0 forks source link

expver data column is showing up in some of the cds downloads #8

Closed jamcinnes2 closed 11 months ago

jamcinnes2 commented 12 months ago

Showed up in these files. Need to fix this in cdstool or see what has changed at CDS.

/data/HWITW/input/cds_era5/2022/total_precipitation/global-2022-335-total_precipitation.nc /data/HWITW/input/cds_era5/2022/cloud_base_height/global-2022-335-cloud_base_height.nc /data/HWITW/input/cds_era5/2022/precipitation_type/global-2022-335-precipitation_type.nc

jamcinnes2 commented 12 months ago

A workaround for now is to just try redownloading these files.

mbjones commented 11 months ago

I removed those three files, and restarted the job. Downloads finished, and then tiletool.py ran. I hoped this would resolve the issue at the end of the thread in #2, but alas we seem to be getting the same error:

download_dataset finished.
cleaning dataset cds_era5...
done.
** HWITW data processing tool v0.9.3 **

debug: start_week 52 num_weeks 52 total_num_hours 8736
Output hwglobal-temperature_and_humidity-2022.nc
Traceback (most recent call last):
  File "/cdstotile/tiletool.py", line 617, in <module>
    main()
  File "/cdstotile/tiletool.py", line 605, in main
    load_netcdfs( flag_args, input_path, output_path, start_year, end_year )
  File "/cdstotile/tiletool.py", line 413, in load_netcdfs
    process_data_group( flag_args, inp_path, out_path, dir_name, year, dg_name, dg );
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/cdstotile/tiletool.py", line 309, in process_data_group
    if wk_var.size-1 < week_i:
       ^^^^^^
UnboundLocalError: cannot access local variable 'wk_var' where it is not associated with a value
jamcinnes2 commented 11 months ago

Aw shoot looks like I have some indentation mistakes in tiletool. Tabs to spaces or vice versa... I'll fix that tonight.

On Sat, Sep 16, 2023, 13:11 Matt Jones @.***> wrote:

I removed those three files, and restarted the job. Downloads finished, and then tiletool.py ran. I hoped this would resolve the issue at the end of the thread in #2 https://github.com/howweirdistheweather/weather_app/issues/2, but alas we seem to be getting the same error:

download_dataset finished. cleaning dataset cds_era5... done. HWITW data processing tool v0.9.3

debug: start_week 52 num_weeks 52 total_num_hours 8736 Output hwglobal-temperature_and_humidity-2022.nc Traceback (most recent call last): File "/cdstotile/tiletool.py", line 617, in main() File "/cdstotile/tiletool.py", line 605, in main load_netcdfs( flag_args, input_path, output_path, start_year, end_year ) File "/cdstotile/tiletool.py", line 413, in load_netcdfs process_data_group( flag_args, inp_path, out_path, dir_name, year, dg_name, dg ); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/cdstotile/tiletool.py", line 309, in process_data_group if wk_var.size-1 < week_i: ^^^^^^ UnboundLocalError: cannot access local variable 'wk_var' where it is not associated with a value

— Reply to this email directly, view it on GitHub https://github.com/howweirdistheweather/weather_app/issues/8#issuecomment-1722317087, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOPJBFRFDP6X4BQYH5FHXW3X2YIYPANCNFSM6AAAAAA4RUKOUE . You are receiving this because you were assigned.Message ID: @.***>

mbjones commented 11 months ago

So, I tried redownloading all of the 2023 total_precipitation data, but I am still getting this error. But it is only happening for day 182. Here's the log:

HWITW Copernicus data download tool v1.0

global-2023-182-total_precipitation.nc requested. download_dataset finished. cleaning dataset cds_era5... done. HWITW data processing tool v0.9.4

debug: start_week 52 num_weeks 52 total_num_hours 8736 Output hwglobal-temperature_and_humidity-2022.nc debug: start_week 52 num_weeks 52 total_num_hours 8736 Output hwglobal-wind-2022.nc debug: start_week 52 num_weeks 52 total_num_hours 8736 Output hwglobal-precipitation-2022.nc debug: start_week 52 num_weeks 52 total_num_hours 8736 Output hwglobal-cloud_cover-2022.nc debug: start_week 37 num_weeks 37 total_num_hours 6283 Output hwglobal-temperature_and_humidity-2023.nc debug: start_week 37 num_weeks 37 total_num_hours 6281 Output hwglobal-wind-2023.nc daily input global-2023-*-total_precipitation.nc could not be opened. variable tp : dimensions mismatch between master /var/data/hwitw/input/cds_era5/2023/total_precipitation/global-2023-001-total_precipitation.nc (('time', 'latitude', 'longitude')) and extension /var/data/hwitw/input/cds_era5/2023/total_precipitation/global-2023-182-total_precipitation.nc (('time', 'expver', 'latitude', 'longitude')) Trying yearly. global-2023-total_precipitation.nc could not be opened! [Errno 2] No such file or directory: '/var/data/hwitw/input/cds_era5/2023/global-2023-total_precipitation.nc'

When I look at the headers in the netcdf files for total_precipitation, all of them have the same dimensions:

dimensions:
    longitude = 1440 ;
    latitude = 721 ;
    time = 24 ;

except for day 182, which has the extra expver dimension.

dimensions:
    longitude = 1440 ;
    latitude = 721 ;
    expver = 2 ;
    time = 24 ;

I looked at every total_precip file for 2023, and 182 is the only one that has this extra dimension. I have also deleted the 182 file several ties, triggering a re-download, and it always has the extra dimension.

@jamcinnes2 I am not sure what to make of this. we seem really close to success.

mbjones commented 11 months ago

After reviewing this with @jamcinnes2 , we found that deleting the data files with expver and rerunning the download got rid of the extra expver info. So, this must have been specific to the day I was running it a while ago. I now was able to run a complete tiletool run for 2023, which generated both the netcdf files and the wxdb file. Here's the output:

hwitw@hwitw-test-66cb954798-rv4hx:/cdstotile$ python3 ${CDS_TOOL_DIR}/tiletool.py --start ${CDS_START_YEAR} --update --input "$CDS_DOWNLOAD_DIR" --output "$DATA_OUTPUT_DIR"
** HWITW data processing tool v0.9.5 **

debug: start_week 38 num_weeks 38 total_num_hours 6427
Output hwglobal-temperature_and_humidity-2023.nc
debug: start_week 38 num_weeks 38 total_num_hours 6425
Output hwglobal-wind-2023.nc
debug: start_week 38 num_weeks 38 total_num_hours 6427
Output hwglobal-precipitation-2023.nc
debug: start_week 38 num_weeks 38 total_num_hours 6429
Output hwglobal-cloud_cover-2023.nc
debug: update_wxdb()
Output WXDB
debug: wxdb_num_vars 29
debug: num_wx_vars 29
debug: opened /var/data/hwitw/output/tt_output/2023/hwglobal-temperature_and_humidity-2023.nc
debug: opened /var/data/hwitw/output/tt_output/2023/hwglobal-wind-2023.nc
debug: opened /var/data/hwitw/output/tt_output/2023/hwglobal-precipitation-2023.nc
debug: opened /var/data/hwitw/output/tt_output/2023/hwglobal-cloud_cover-2023.nc
debug: write wxdb
debug: num_wk 38

And this single 2023 run made it so that the webapp started working (but showing only 2023 data):

image

Closing this issue as completed -- I'll open other issues if other problems arise when handling the rest of the years of data.

mbjones commented 11 months ago

Note for future self. NetCDF files with this expver dimension can be identified quickly in batch with a command like:

find . -type f -name '*.nc' -print0 | xargs -0 -n1 -P14 ncdump -h | grep -B 5 expver |grep netcdf