First was to fix the assignment of nodata values in the computed indicators. After a marathon huddle with @kyleredilla , we solved this by just moving the .compute() in indicators.py. There was no obvious error here, but the nodata values (-9999 and np.nan) simply were not being assigned as we assumed they were.
We also figured out how to change the ftc indicator datatype from timedelta to integer, just using a basic astype(int).
As far as the QC workflow, there are some minor changes to the nesting of QC tasks to prevent all of the tasks from running if the file does not exist or does not open. This cuts down on redundant error messages.
I also added a new indicators/qc.check_nodata_against_inputs() function that works backwards from each indicator filename and uses lookup tables to build the filepaths of input data. Where there are nodata values in the indicators (-9999 or np.nan, depending on the dtype) there should also be nodata values in the input data. Any discrepancies will print an error to the qc_error.txt. In order to accomplish this, I had to also revise the prefect QC task to accept the input data directory as an argument. (There is a companion branch qc_edit and PR #7 in the Prefect repo.)
TO TEST:
Pull the qc_edit branch into your Prefect repo. Start the server and run indicators/generate_indicators.py from that branch.
Run the flow using the following parameters but substiuting your ssh username / ssh key path / working directory:
Observe the QC output in the Prefect log and in the qc_error.txt file. There should be 3 errors found while computing the dw, su, and ftc variables saying that the CESM2 historical data does not exist. (That's a known issue, and is confirmed in the QC here.)
Open a copy of the nodata_test.ipynb notebook provided by @Joshdpaul and use it to bugger up some indicator files. Here we are intentionally messing with the nodata values to introduce errors.
Run qc.py from the command line as described at the end of the notebook, and check out qc_errors.txt. In addition to the 3 errors mentioned above, you should see errors related to the files you just buggered up. Note that this output is pretty verbose! We get an error for every year that was computed for the indicator. I think allows better error tracing, but it could be trimmed down too depending on our preference.
Open up at least one file from each computed indicator to check the datatypes: rx1day should be float, and dw, su, and ftc should all be integers.
PS) I think there are going to be conflicts here, because I merged @BobTorgerson 's fix_variable_inputs branch into this one while it was in progress... seemed like a good idea at the time :) Hopefully not too difficult to resolve!
This PR closes #11 and closes #16 .
There's a few different things I went after here:
First was to fix the assignment of nodata values in the computed indicators. After a marathon huddle with @kyleredilla , we solved this by just moving the
.compute()
inindicators.py
. There was no obvious error here, but the nodata values (-9999 andnp.nan
) simply were not being assigned as we assumed they were.We also figured out how to change the
ftc
indicator datatype from timedelta to integer, just using a basicastype(int)
.As far as the QC workflow, there are some minor changes to the nesting of QC tasks to prevent all of the tasks from running if the file does not exist or does not open. This cuts down on redundant error messages.
I also added a new
indicators/qc.check_nodata_against_inputs()
function that works backwards from each indicator filename and uses lookup tables to build the filepaths of input data. Where there are nodata values in the indicators (-9999 ornp.nan
, depending on the dtype) there should also be nodata values in the input data. Any discrepancies will print an error to theqc_error.txt
. In order to accomplish this, I had to also revise the prefect QC task to accept the input data directory as an argument. (There is a companion branchqc_edit
and PR #7 in the Prefect repo.)TO TEST:
qc_edit
branch into your Prefect repo. Start the server and runindicators/generate_indicators.py
from that branch.qc_error.txt
file. There should be 3 errors found while computing thedw
,su
, andftc
variables saying that the CESM2 historical data does not exist. (That's a known issue, and is confirmed in the QC here.)nodata_test.ipynb
notebook provided by @Joshdpaul and use it to bugger up some indicator files. Here we are intentionally messing with the nodata values to introduce errors.qc.py
from the command line as described at the end of the notebook, and check outqc_errors.txt
. In addition to the 3 errors mentioned above, you should see errors related to the files you just buggered up. Note that this output is pretty verbose! We get an error for every year that was computed for the indicator. I think allows better error tracing, but it could be trimmed down too depending on our preference.rx1day
should be float, anddw
,su
, andftc
should all be integers.PS) I think there are going to be conflicts here, because I merged @BobTorgerson 's
fix_variable_inputs
branch into this one while it was in progress... seemed like a good idea at the time :) Hopefully not too difficult to resolve!