pacificclimate / climate-explorer-data-prep

0 stars 0 forks source link

NetCDF history attribute indicates when generate_climos started and e… #117

Closed sum1lim closed 4 years ago

sum1lim commented 4 years ago

…nded

As described in issue #82, generate_climos did not update history attribute for its own command(while it still updated cdo commands ran inside generate_climos). For example:

:history = "Fri May 22 15:21:55 2020: cdo -O -replace /tmp/cdoPyy5na14qm /tmp/cdoPy4pzh5rv7 /tmp/cdoPy6exys6hk\n",
    "Fri May 22 15:21:54 2020: cdo -O -timmean /tmp/cdoPysbe3hq0e /tmp/cdoPyy5na14qm\n",
    "Fri May 22 15:21:54 2020: cdo -O -seldate,1961-01-01,1990-12-31 tests/data/tiny_daily_pr.nc /tmp/cdoPysbe3hq0e\n",
    "Thu Mar 21 14:49:01 2019: cdo sellonlatbox,216.5625,219.375,40.4636506825932,48.8352434707287 /storage/data/climate/downscale/BCCAQ2/CMIP5/CanESM2/pr_day_CanESM2_historical+rcp26_r1i1p1_19500101-21001231.nc /storage/home/nrados/tiny_daily_pr.nc\n",
    "Thu Sep  1 14:34:03 2016: ncrcat -O /storage/data/climate/downscale/BCCAQ2/CMIP5/timetmp/time_subset_pr_day_CanESM2_historical_r1i1p1_19500101-20051231.nc /storage/data/climate/downscale/BCCAQ2/CMIP5/timetmp/time_subset_pr_day_CanESM2_rcp26_r1i1p1_20060101-21001231.nc /storage/data/climate/downscale/BCCAQ2/CMIP5/CanESM2/pr_day_CanESM2_historical+rcp26_r1i1p1_19500101-21001231.nc\n",
    "Thu Sep 01 14:33:15 2016: cdo -O seldate,1950-01-01T00:00,2005-12-31T23:59 /storage/data/climate/downscale/BCCAQ2/CMIP5/spacetmp/space_subset_pr_day_CanESM2_historical_r1i1p1_18500101-20051231.nc /storage/data/climate/downscale/BCCAQ2/CMIP5/timetmp/time_subset_pr_day_CanESM2_historical_r1i1p1_19500101-20051231.nc\n",
    "Thu Sep  1 14:32:42 2016: ncks -O -d lon,215.,310. -d lat,40.,85. /storage/data/climate/downscale/BCCAQ2/CMIP5/grouptmp/pr_day_CanESM2_historical_r1i1p1_18500101-20051231.nc /storage/data/climate/downscale/BCCAQ2/CMIP5/spacetmp/space_subset_pr_day_CanESM2_historical_r1i1p1_18500101-20051231.nc\n",
    "2011-04-13T23:04:41Z CMOR rewrote data to comply with CF standards and CMIP5 requirements.\n",
    "" ;

the last 3 cdo commands ran as a part of generate_climos command, but it is almost impossible to know the information(time & arguments) of generate_climos was performed on. Therefore, additional lines that describe `generate_climos' command were desired to be present in the history attribute. This commit satisfies the desired functionality by generating the following result:

:history = "Fri May 22 15:21:55 2020: end : generate_climos -otests/output tests/data/tiny_daily_pr.nc -p mean\n",
    "Fri May 22 15:21:55 2020: cdo -O -replace /tmp/cdoPyy5na14qm /tmp/cdoPy4pzh5rv7 /tmp/cdoPy6exys6hk\n",
    "Fri May 22 15:21:54 2020: cdo -O -timmean /tmp/cdoPysbe3hq0e /tmp/cdoPyy5na14qm\n",
    "Fri May 22 15:21:54 2020: cdo -O -seldate,1961-01-01,1990-12-31 tests/data/tiny_daily_pr.nc /tmp/cdoPysbe3hq0e\n",
    "Fri May 22 15:21:54 2020: start : generate_climos -otests/output tests/data/tiny_daily_pr.nc -p mean\n",
    "Thu Mar 21 14:49:01 2019: cdo sellonlatbox,216.5625,219.375,40.4636506825932,48.8352434707287 /storage/data/climate/downscale/BCCAQ2/CMIP5/CanESM2/pr_day_CanESM2_historical+rcp26_r1i1p1_19500101-21001231.nc /storage/home/nrados/tiny_daily_pr.nc\n",
    "Thu Sep  1 14:34:03 2016: ncrcat -O /storage/data/climate/downscale/BCCAQ2/CMIP5/timetmp/time_subset_pr_day_CanESM2_historical_r1i1p1_19500101-20051231.nc /storage/data/climate/downscale/BCCAQ2/CMIP5/timetmp/time_subset_pr_day_CanESM2_rcp26_r1i1p1_20060101-21001231.nc /storage/data/climate/downscale/BCCAQ2/CMIP5/CanESM2/pr_day_CanESM2_historical+rcp26_r1i1p1_19500101-21001231.nc\n",
    "Thu Sep 01 14:33:15 2016: cdo -O seldate,1950-01-01T00:00,2005-12-31T23:59 /storage/data/climate/downscale/BCCAQ2/CMIP5/spacetmp/space_subset_pr_day_CanESM2_historical_r1i1p1_18500101-20051231.nc /storage/data/climate/downscale/BCCAQ2/CMIP5/timetmp/time_subset_pr_day_CanESM2_historical_r1i1p1_19500101-20051231.nc\n",
    "Thu Sep  1 14:32:42 2016: ncks -O -d lon,215.,310. -d lat,40.,85. /storage/data/climate/downscale/BCCAQ2/CMIP5/grouptmp/pr_day_CanESM2_historical_r1i1p1_18500101-20051231.nc /storage/data/climate/downscale/BCCAQ2/CMIP5/spacetmp/space_subset_pr_day_CanESM2_historical_r1i1p1_18500101-20051231.nc\n",
    "2011-04-13T23:04:41Z CMOR rewrote data to comply with CF standards and CMIP5 requirements.\n",
    "" ;

Yet, the command line arguments shown in the history only include -o; --outdir filepaths. Adding more arguments would require passing args or more parameters when create_climo_files is called in generate_climos script. I would like to hear any suggestion if there are more necessary arguments to be shown in history attribute.

For the record, black command has not been applied to the changed files yet.

corviday commented 4 years ago

Overall, looks very good. I ran generate_climos on a bunch of files with weird history attributes, like date-last lines, histories with no dates at all, every kind of weirdness I can remember seeing in our data. It worked well on all of them.

However, if I run generate_climos on a file with no history attribute at all, the start and end lines do not appear. We hope any data files we process will have a history attribute, but they don't always, and it would be nice to support that case.

(If you would like to make a data file with no history attribute for testing, you can use the update_metadata script in this project to remove the history attribute from an existing file.)

Also, a very small formatting change; there's no space between -o and the output directory: Mon May 25 11:49:10 2020: end : generate_climos -ooutput/ streamflow_peace_CanESM2_rcp85_r1i1p1.nc -p mean

I would like to hear any suggestion if there are more necessary arguments to be shown in history attribute.

It would be nice to have the -g argument if the user has specified one. The longitude normalization code possibly generates files that some tools can't read, though investigating that issue has been a low priority. It might be useful to know which files are affected by checking their history.