Revised 3/7/24: The standalone longitude correction / CRS script has been rolled into the main regrid.py function instead!
This PR closes #25 and closes #30
The regrid.py script now includes:
a reindexing of longitude coordinates as part of the init_regridder() function.
a new apply_wgs84() function that checks for an existing "spatial_ref" coordinate in the dataset, and if not found will attempt to write CF-compliant CRS info to the file.
a new write_retry_batch_file() function which will write any filepaths that were not successfully regridded to a separate text file to be retried in a new slurm job. Combined with the new try/except routine in the main block, this allows batches to try every file in the list regardless of whether or not errors are found in some filepaths.
an additional query in generate_batch_files.py that will exclude subdaily frequencies (ie, data transferred specifically for WRF downscaling but not wanted for regridding)
TO TEST:
Start the regridding pipeline as usual by generating the batch files in your scratch directory.
Delete most of the batch files, leaving a small subset for testing the actual regridding.
In one of these batch files, include some bogus filenames to generate errors.
As usual, use the regrid_cmip6.ipynb notebook to submit the slurm jobs.
All slurm jobs, even for the batch file with bogus filenames, should complete without a "FAILED" state.
Check out the slurm job outputs, to confirm that the batch with bogus filenames still completed and that error messages were written into the output file. There should be messaging indicating that some files were not regridded.
Check the directory with the batch files. You should see a new batch_retry.txt file containing the bogus filenames that were not regridded.
Open one of the regridded files in QGIS against a basemap. Check out the properties of the layer and confirm that QGIS recognizes the CRS as WGS84, and that the image is rendered in the correct location. Some basemaps do not actually extend to 90deg latitude, and our current target grid does not actually extend to the meridian due to weird half-sized pixels. Keep that in mind when viewing against a basemap, and instead look to see that the features in the interior of the image seem to generally align with land masses etc.
Open one of the regridded files using xarray and check the CRS info using rioxarray accessor. (You will need an environment that has rioxarray):
Things to note:
The longitude attributes in the regridded .nc files may still reference values 0-360, since this branch does not include the attribute fixes yet. You may also see some warnings if opening the regridded files with xarray.open_dataset(decode_coords='all') that stem from non-standard attributes.
Future work:
Now that we have slurm outputs that are searchable (ie, have standardized error messaging), we can include them in a QC process similar to the indicators Prefect flow. After the jobs complete in the Prefect flow, we can look for the "retry" file and try any bad files a second time. Thats probably also the point in the flow where we can address this issue about stuck jobs, maybe setting a time limit for a batch to complete and adding the files to the "retry" batch if it gets stuck.
Exploratory notebooks:
These were updated in this branch and @kyleredilla and I were messing with NaN values, grids, extrapolation, etc. That work is going to be committed here but is really part of a different grid selection problem that will be solved in other branches.
Revised 3/7/24: The standalone longitude correction / CRS script has been rolled into the main
regrid.py
function instead!This PR closes #25 and closes #30
The
regrid.py
script now includes:a reindexing of longitude coordinates as part of the
init_regridder()
function.a new
apply_wgs84()
function that checks for an existing "spatial_ref" coordinate in the dataset, and if not found will attempt to write CF-compliant CRS info to the file.a new
write_retry_batch_file()
function which will write any filepaths that were not successfully regridded to a separate text file to be retried in a new slurm job. Combined with the new try/except routine in the main block, this allows batches to try every file in the list regardless of whether or not errors are found in some filepaths.an additional query in
generate_batch_files.py
that will exclude subdaily frequencies (ie, data transferred specifically for WRF downscaling but not wanted for regridding)TO TEST:
regrid_cmip6.ipynb
notebook to submit the slurm jobs.batch_retry.txt
file containing the bogus filenames that were not regridded.xarray
and check the CRS info usingrioxarray
accessor. (You will need an environment that hasrioxarray
):Things to note:
The longitude attributes in the regridded .nc files may still reference values 0-360, since this branch does not include the attribute fixes yet. You may also see some warnings if opening the regridded files with
xarray.open_dataset(decode_coords='all')
that stem from non-standard attributes.Future work: Now that we have slurm outputs that are searchable (ie, have standardized error messaging), we can include them in a QC process similar to the indicators Prefect flow. After the jobs complete in the Prefect flow, we can look for the "retry" file and try any bad files a second time. Thats probably also the point in the flow where we can address this issue about stuck jobs, maybe setting a time limit for a batch to complete and adding the files to the "retry" batch if it gets stuck.
Exploratory notebooks: These were updated in this branch and @kyleredilla and I were messing with NaN values, grids, extrapolation, etc. That work is going to be committed here but is really part of a different grid selection problem that will be solved in other branches.