Closed Joshdpaul closed 3 months ago
@kyleredilla I have made all the changes you suggested above, except one: I'd like to keep the processing errors messages in the slurm output so we can more easily debug any files that did not make it thru regrid.py
. This is how I found some of the seemingly random file reading errors, and I think will be important if we get similar errors in the future.
There is now a corresponding PR in the prefect repo. To test, you will need the new no_clobber
branch from that repo. Otherwise the testing is the same as described in the PR instructions above.
This PR closes #42 . The
{no_clobber}
parameter provided in the Prefect flow is now evalauted in the main block ofregrid.py
, and any files that have already been regridded will be skipped. _(Note that there is no parallel branch in the Prefect repo for this PR, since the{no_clobber}
parameter has already been included in themain
branch of that repo.)_Since some of the regridded datasets are split into yearly files just before writing, I had to do a little trick in lines 505-515 of
regrid.py
to use just the part of filename before the date string in order to search thru existing files. The assumption here is that if ANY of the existing files begin with the nodate string, those files have already been regridded and the multi-year file that was fed into theregrid.py
script will be skipped.The alternative was to search for the exact yearly file names, but that involves a) regridding the file before evaluation, which is a waste of time, or b) rewriting the code extensively so as to parse all filenames before any regridding happens, which would be more work.
I think this method is a compromise that will do the job. If you see problems with this approach please let me know!
TO TEST:
Similar to this PR, you can start an initial flow with the
{generate_batch_files}
parameter set to "true", then subset a small amount of batch files, and run the flow again with the{generate_batch_files}
parameter set to "false". This will generate some regridded output. (Be sure to specify theno_clobber
branch of this repo!)Then run the flow again with the
{generate_batch_files}
parameter set to "false" and the{no_clobber}
parameter set to "true". This should finish quickly because it will skip all regridding, and your slurm output files should look like this:The lines that begin with "OVERWRITE_ERROR" (or "PROCESSING ERROR" if the regrid fails for some reason) can now be easily scraped by a forthcoming QC script that will collect the slurm job outputs and provide summarized info about the Prefect flow run.