noaa-ocs-modeling / EnsemblePerturbation

perturbation of coupled model input over a space of input variables
https://ensembleperturbation.readthedocs.io
Creative Commons Zero v1.0 Universal
7 stars 3 forks source link

Feature/schism #85

Closed SorooshMani-NOAA closed 2 years ago

SorooshMani-NOAA commented 2 years ago

Add support for SCHISM ensemble generation using package's CLI perturb_tracks

SorooshMani-NOAA commented 2 years ago

@zacharyburnett do you think this minor change deserves its own pull request or should I wait until all the output parsing for schism is added too? I don't know when I would get to that part in terms of timing

SorooshMani-NOAA commented 2 years ago

@WPringle for the initial analysis of SCHISM results, do you we only need subset_dataset, combine_outputs and extrapolate_water_elevation_to_dry_areas functions to be defined so that the examples scripts work? Or are you directly using any of the other classes and functions from parsing.adcirc for the analysis? I don't see any ADCIRC specific code used anywhere else in the code, am I right?

As I'm going through the output parser classes in the parsing/adcirc.py file to replicate them in parsing/schism.py, I see that some of them can not be readily retrofitted for SCHISM. So now I'm thinking of just providing the main functionality of the 3 functions I mentioned above, so that we can do the analysis, and then cleanup the backend classes for SCHISM outputs. The main issue is how SCHISM outputs are organized compared to ADCIRC. Some schism outputs are text and not .nc, and other than that some of the variables (2D vars) are all in the same file out2d_*.nc. Most of these could be worked around, but the bigger problem I see is that sometimes SCHISM outputs are divided in time, so I may have out2d_1.nc, out2d_2.nc, ... for a given simulation. Handling this is not easy (at least at first sight) to implement within the framework of classes that exist for ADCIRC

@zacharyburnett do you have any comments?

zacharyburnett commented 2 years ago

Yes, it seems that ADCIRC and SCHISM outputs are too different to be encompassed by a single abstraction. It might be that the joining abstraction between them would just be a shim class (with only pass and maybe a file_glob abstract property or something to that effect). Other than that, I don't think it's necessary to exactly replicate the ADCIRC parser API; it would be more effect for the drive for abstraction to come from common usages in the client code like in examples/

WPringle commented 2 years ago

@SorooshMani-NOAA Yes, I would advise against replication in the parser if it doesn't fit, but ideally we would like to produce the same output file (e.g., combined maxele.63.nc with dimensions of number of ensembles and the number of mesh points). So as a first step just looking at how to convert the global elevation time series to maximum elevation and then combine all the ensemble runs together into a maxele.63.nc. Then the subsetting and other routines will work as before on the maxele.63.nc because it has same format. The SCHISM model should have same attributes like elements, depth, lon, lat, etc that we can make into a maxele.63.nc file.

SorooshMani-NOAA commented 2 years ago

@WPringle what are the ADCIRC outputs that have element output instead of nodal? In SCHISM it seems all the quantities of interest are outputted on nodes.

WPringle commented 2 years ago

@SorooshMani-NOAA All the ADCIRC outputs are also nodal. The element table is kept just for plotting the surface.

SorooshMani-NOAA commented 2 years ago

OK I see, thanks. I think in case of SCHISM we need to separately hold onto the grid file instead. Just like SCHISM station outputs that requires station.in to know which station is where

zacharyburnett commented 2 years ago

@SorooshMani-NOAA speaking of stations.in, you could also add a feature to pyschism that constructs a stations.in from an arbitrary region or set of CO-OPS IDs; I added that functionality to adcircpy (for fort.15) here: https://github.com/noaa-ocs-modeling/adcircpy/blob/a8b56e39c64eca3c949b19217c7d3cee741d03ff/adcircpy/fort15.py#L34-L149

codecov-commenter commented 2 years ago

Codecov Report

Merging #85 (8a66f3a) into main (89c3807) will decrease coverage by 5.65%. The diff coverage is 0.00%.

@@            Coverage Diff             @@
##             main      #85      +/-   ##
==========================================
- Coverage   30.73%   25.07%   -5.66%     
==========================================
  Files          25       26       +1     
  Lines        2922     3581     +659     
==========================================
  Hits          898      898              
- Misses       2024     2683     +659     
Impacted Files Coverage Δ
ensembleperturbation/client/combine_results.py 0.00% <0.00%> (ø)
ensembleperturbation/client/perturb_tracks.py 0.00% <0.00%> (ø)
ensembleperturbation/parsing/adcirc.py 46.88% <0.00%> (-0.66%) :arrow_down:
ensembleperturbation/parsing/schism.py 0.00% <0.00%> (ø)

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

SorooshMani-NOAA commented 2 years ago

@WPringle does ADCIRC or your ensemble analysis code support mix of tria and quads?

WPringle commented 2 years ago

@WPringle does ADCIRC or your ensemble analysis code support mix of tria and quads?

@SorooshMani-NOAA No, just tria. I was thinking of that yesterday, what we would do when have the quads, I'm not familiar with how to visualize those in matplotlib. One option is to try and convert each quad to two triangles I suppose.

SorooshMani-NOAA commented 2 years ago

try and convert each quad

That should be easy to do. I'll try this when I add the table. Thanks!

SorooshMani-NOAA commented 2 years ago

@WPringle the reason swath breaks at https://github.com/noaa-ocs-modeling/EnsemblePerturbation/blob/89c38072506103eb024f31a7cc61d026b9b40785/ensembleperturbation/parsing/adcirc.py#L371 is that now stormevents returns a dictionary for windswath, where the key is the advisory name and the value is another dictionary with the date of advisory as the key and value being the swath shape:

{'BEST': {'20180908T000000': <shapely.geometry.polygon.Polygon object at 0x2b29a3190160>, '20180908T060000': <shapely.geometry.polygon.Polygon object at 0x2b29987722e0>, '20180908T120000': <shapely.geometry.polygon.Polygon object at 0x2b29988d8d00>, '20180908T180000': <shapely.geometry.polygon.Polygon object at 0x2b29988d8c40>, '20180909T000000': <shapely.geometry.polygon.Polygon object at 0x2b29988eed60>, '20180909T060000': <shapely.geometry.polygon.Polygon object at 0x2b29983d40d0>, '20180909T120000': <shapely.geometry.polygon.Polygon object at 0x2b29988df6a0>, '20180909T180000': <shapely.geometry.polygon.Polygon object at 0x2b29988d8dc0>, '20180910T000000': <shapely.geometry.polygon.Polygon object at 0x2b29a3166ca0>, '20180910T060000': <shapely.geometry.polygon.Polygon object at 0x2b29988eed90>, '20180910T120000': <shapely.geometry.polygon.Polygon object at 0x2b29a3190be0>, '20180910T180000': <shapely.geometry.polygon.Polygon object at 0x2b29988dfaf0>, '20180911T000000': <shapely.geometry.polygon.Polygon object at 0x2b29988df760>, '20180911T060000': <shapely.geometry.polygon.Polygon object at 0x2b29a3166dc0>, '20180911T120000': <shapely.geometry.polygon.Polygon object at 0x2b29988eea90>, '20180911T180000': <shapely.geometry.polygon.Polygon object at 0x2b29a3179130>, '20180912T000000': <shapely.geometry.polygon.Polygon object at 0x2b29988dfdc0>, '20180912T060000': <shapely.geometry.polygon.Polygon object at 0x2b29988eee80>, '20180912T120000': <shapely.geometry.polygon.Polygon object at 0x2b29988eeee0>, '20180912T180000': <shapely.geometry.polygon.Polygon object at 0x2b29988df5b0>, '20180913T000000': <shapely.geometry.polygon.Polygon object at 0x2b29988dfbb0>, '20180913T060000': <shapely.geometry.polygon.Polygon object at 0x2b29988ee8e0>, '20180913T120000': <shapely.geometry.polygon.Polygon object at 0x2b29988eefd0>, '20180913T180000': <shapely.geometry.polygon.Polygon object at 0x2b29988df5e0>, '20180914T000000': <shapely.geometry.polygon.Polygon object at 0x2b29988ee1c0>, '20180914T060000': <shapely.geometry.polygon.Polygon object at 0x2b29988ee3a0>, '20180914T111500': <shapely.geometry.polygon.Polygon object at 0x2b2993c07e20>, '20180914T120000': <shapely.geometry.polygon.Polygon object at 0x2b29988ee580>, '20180914T180000': <shapely.geometry.polygon.Polygon object at 0x2b29988ee700>, '20180915T000000': <shapely.geometry.polygon.Polygon object at 0x2b29988eefa0>, '20180915T060000': <shapely.geometry.polygon.Polygon object at 0x2b29988ee820>, '20180915T120000': <shapely.geometry.polygon.Polygon object at 0x2b29988eeca0>, '20180915T180000': <shapely.geometry.polygon.Polygon object at 0x2b29988eef40>, '20180916T000000': <shapely.geometry.polygon.Polygon object at 0x2b29988ee310>, '20180916T060000': <shapely.geometry.polygon.Polygon object at 0x2b29988ee610>, '20180916T120000': <shapely.geometry.polygon.Polygon object at 0x2b29988ee9d0>}}

For our this usecase, should I modify the code to pick BEST and if not available, pick OFCL, and with the last existing timestamp? Or should I return a lit of shapes for all timestamps?

SorooshMani-NOAA commented 2 years ago

@WPringle I'm not sure what is going on here, but the issue with missing 'depth' variable seems to be this: https://github.com/noaa-ocs-modeling/EnsemblePerturbation/blob/89c38072506103eb024f31a7cc61d026b9b40785/ensembleperturbation/parsing/adcirc.py#L790-L798 because we're just passing the node DataArray to the subset algorithm, while it is expecting a Dataset to my understanding.

WPringle commented 2 years ago

@WPringle the reason swath breaks at

https://github.com/noaa-ocs-modeling/EnsemblePerturbation/blob/89c38072506103eb024f31a7cc61d026b9b40785/ensembleperturbation/parsing/adcirc.py#L371

is that now stormevents returns a dictionary for windswath, where the key is the advisory name and the value is another dictionary with the date of advisory as the key and value being the swath shape:

{'BEST': {'20180908T000000': <shapely.geometry.polygon.Polygon object at 0x2b29a3190160>, '20180908T060000': <shapely.geometry.polygon.Polygon object at 0x2b29987722e0>, '20180908T120000': <shapely.geometry.polygon.Polygon object at 0x2b29988d8d00>, '20180908T180000': <shapely.geometry.polygon.Polygon object at 0x2b29988d8c40>, '20180909T000000': <shapely.geometry.polygon.Polygon object at 0x2b29988eed60>, '20180909T060000': <shapely.geometry.polygon.Polygon object at 0x2b29983d40d0>, '20180909T120000': <shapely.geometry.polygon.Polygon object at 0x2b29988df6a0>, '20180909T180000': <shapely.geometry.polygon.Polygon object at 0x2b29988d8dc0>, '20180910T000000': <shapely.geometry.polygon.Polygon object at 0x2b29a3166ca0>, '20180910T060000': <shapely.geometry.polygon.Polygon object at 0x2b29988eed90>, '20180910T120000': <shapely.geometry.polygon.Polygon object at 0x2b29a3190be0>, '20180910T180000': <shapely.geometry.polygon.Polygon object at 0x2b29988dfaf0>, '20180911T000000': <shapely.geometry.polygon.Polygon object at 0x2b29988df760>, '20180911T060000': <shapely.geometry.polygon.Polygon object at 0x2b29a3166dc0>, '20180911T120000': <shapely.geometry.polygon.Polygon object at 0x2b29988eea90>, '20180911T180000': <shapely.geometry.polygon.Polygon object at 0x2b29a3179130>, '20180912T000000': <shapely.geometry.polygon.Polygon object at 0x2b29988dfdc0>, '20180912T060000': <shapely.geometry.polygon.Polygon object at 0x2b29988eee80>, '20180912T120000': <shapely.geometry.polygon.Polygon object at 0x2b29988eeee0>, '20180912T180000': <shapely.geometry.polygon.Polygon object at 0x2b29988df5b0>, '20180913T000000': <shapely.geometry.polygon.Polygon object at 0x2b29988dfbb0>, '20180913T060000': <shapely.geometry.polygon.Polygon object at 0x2b29988ee8e0>, '20180913T120000': <shapely.geometry.polygon.Polygon object at 0x2b29988eefd0>, '20180913T180000': <shapely.geometry.polygon.Polygon object at 0x2b29988df5e0>, '20180914T000000': <shapely.geometry.polygon.Polygon object at 0x2b29988ee1c0>, '20180914T060000': <shapely.geometry.polygon.Polygon object at 0x2b29988ee3a0>, '20180914T111500': <shapely.geometry.polygon.Polygon object at 0x2b2993c07e20>, '20180914T120000': <shapely.geometry.polygon.Polygon object at 0x2b29988ee580>, '20180914T180000': <shapely.geometry.polygon.Polygon object at 0x2b29988ee700>, '20180915T000000': <shapely.geometry.polygon.Polygon object at 0x2b29988eefa0>, '20180915T060000': <shapely.geometry.polygon.Polygon object at 0x2b29988ee820>, '20180915T120000': <shapely.geometry.polygon.Polygon object at 0x2b29988eeca0>, '20180915T180000': <shapely.geometry.polygon.Polygon object at 0x2b29988eef40>, '20180916T000000': <shapely.geometry.polygon.Polygon object at 0x2b29988ee310>, '20180916T060000': <shapely.geometry.polygon.Polygon object at 0x2b29988ee610>, '20180916T120000': <shapely.geometry.polygon.Polygon object at 0x2b29988ee9d0>}}

For our this usecase, should I modify the code to pick BEST and if not available, pick OFCL, and with the last existing timestamp? Or should I return a lit of shapes for all timestamps?

I think your solution is a good... Just want to have the overall wind swath as a geodataframe polygon. I would need to play around with this myself to know exactly what to do, but let's see if that works.

WPringle commented 2 years ago

@WPringle I'm not sure what is going on here, but the issue with missing 'depth' variable seems to be this:

https://github.com/noaa-ocs-modeling/EnsemblePerturbation/blob/89c38072506103eb024f31a7cc61d026b9b40785/ensembleperturbation/parsing/adcirc.py#L790-L798

because we're just passing the node DataArray to the subset algorithm, while it is expecting a Dataset to my understanding.

I see, so is it a Dataset that is being passed to subset_dataset in the main code?

SorooshMani-NOAA commented 2 years ago

I see, so is it a Dataset that is being passed to subset_dataset in the main code?

Depending on which category it falls under, subset_dataset can call either of the subset methods. Some of them are OK with DataArray as well, as long as x and y are coordinates. Also if we don't pass maximum_depth or minimum_depth it won't look for a depth variable in the input. I think since this code is generic (any type can be passed) passing DataArray can cause more problem than solve in terms of maintenance. For now I can limit the types of data passing to the subset method based on what variables are available, which would be a modification to adcirc.py code.

@zacharyburnett is there anything I missed about this subset methods and how to use them?

zacharyburnett commented 2 years ago

@zacharyburnett is there anything I missed about this subset methods and how to use them?

it looks like you covered the relevant points here, the DataArray may have some of the same interface with Dataset but it's better IMO to be safe

SorooshMani-NOAA commented 2 years ago

@WPringle with the latest code in this branch I was able to run your script for generating plots. I've shared the results with you on Google Drive. If you want, please let me know what snapshots I should include in this ticket for reference. When you have time can you please review the results and let me know if we can merge this to main now or it r requires further work? Thanks

WPringle commented 2 years ago

@SorooshMani-NOAA Interesting, it looks like all the plots and things went through good and everything, but there is some problem with the surrogate prediction. The sensitivities are giving us some answer so I guess the surrogate prediction is perhaps being converted to NaNs which is set to occur when the water level goes below the topography + min allowable depth. Are these points really wet points in the SCHISM mesh or are they overland? I think you will need to debug what is happening when it makes a surrogate prediction for a point/ensemble and how the depths etc are being treated. Note that depth is negative over land in ADCIRC.

SorooshMani-NOAA commented 2 years ago

@WPringle thanks for your feedback. I just looked into the code again and noticed a very bad mistake in my dry/wet node parsing logic. I'll update again and share the results when done.

SorooshMani-NOAA commented 2 years ago

I'm just adding this comment for future reference:

I'm a little bit worried about how we're kind of implicitly passing around Dataset and DataArray interchangeably in the parsing and kl scripts. Some of the issues I see during running in my test cases is because getting a specific DataArray of a Dataset can remove attributes or unused dimensions/coordinates. It would be a good idea to stick with Dataset as much as we can in future refactors. Right now I'm modifying the SCHISM parsing code to almost match ADCIRC parsing outputs, but in general I'm not very comfortable with some of the things I need to do.

zacharyburnett commented 2 years ago

I'm a little bit worried about how we're kind of implicitly passing around Dataset and DataArray interchangeably in the parsing and kl scripts. Some of the issues I see during running in my test cases is because getting a specific DataArray of a Dataset can remove attributes or unused dimensions/coordinates. It would be a good idea to stick with Dataset as much as we can in future refactors. Right now I'm modifying the SCHISM parsing code to almost match ADCIRC parsing outputs, but in general I'm not very comfortable with some of the things I need to do.

I apologize for that, at the time of writing I was not as familiar with xarray as I am now. It would probably be wise to use Dataset everywhere, as it includes all of the functionality of DataArray (to my knowledge).

SorooshMani-NOAA commented 2 years ago

No problem @zacharyburnett; no need to apologize! I just wrote this here for future ref. I didn't mean to criticize or anything; I didn't even tag you or William to avoid giving that impression! Since I'm writing everything related to my dev here, I just wanted to have this as a side note so that I or anyone working on the parser later knows that my code needs clean up too because of what I described.

SorooshMani-NOAA commented 2 years ago

@WPringle I tried both _wetonly and _inundation_extrapolation scripts and both work and generate the plots. I also tried generating plots with elements (point spacing equal to None). To make the _extrapolation script work, I only had to modify a small piece of the second script: https://github.com/noaa-ocs-modeling/EnsemblePerturbation/blob/97ca6a85955e7faaf368df8ca183760331aa8025/examples/karhunenloeve_polynomialchaos/klpc_inundation_extrapolation.py#L181-L191

So that it sends None for node selection. I also added the snippet below to both of the scripts to work in case we only have one set of results:

https://github.com/noaa-ocs-modeling/EnsemblePerturbation/blob/97ca6a85955e7faaf368df8ca183760331aa8025/examples/karhunenloeve_polynomialchaos/klpc_inundation_extrapolation.py#L147-L151

I will upload the results on Google Drive again, if everything looks good, I think we can go ahead and merge this as well. What do you think?

SorooshMani-NOAA commented 2 years ago

@WPringle I'm finalizing and testing the schism ensemble support branch on my workflow system and I ran into an error like this:

2022-08-25T21:51:17.652842+00:00  INFO     File "/home/ondemand-user/icogsc/lib/python3.9/site-packages/ensembleperturbation/uncertainty_quantification/surrogate.py", line 238, in sensitivities_from_surrogate
2022-08-25T21:51:17.652893+00:00  INFO     chaospy.Sens_m(surrogate_model, distribution),
2022-08-25T21:51:17.652944+00:00  INFO     File "/home/ondemand-user/icogsc/lib/python3.9/site-packages/chaospy/descriptives/sensitivity/main.py", line 43, in Sens_m
2022-08-25T21:51:17.652994+00:00  INFO     conditional = E_cond(poly[valids], unit_vec, dist, **kws)
2022-08-25T21:51:17.653045+00:00  INFO     File "/home/ondemand-user/icogsc/lib/python3.9/site-packages/chaospy/descriptives/conditional.py", line 70, in E_cond
2022-08-25T21:51:17.653095+00:00  INFO     frozen = poly(**{("q%d" % idx): 1 for idx, keep in enumerate(freeze) if not keep})
2022-08-25T21:51:17.653146+00:00  INFO     File "/home/ondemand-user/icogsc/lib/python3.9/site-packages/numpoly/baseclass.py", line 596, in __call__
2022-08-25T21:51:17.653197+00:00  INFO     return numpoly.call(self, args, kwargs)
2022-08-25T21:51:17.653247+00:00  INFO     File "/home/ondemand-user/icogsc/lib/python3.9/site-packages/numpoly/poly_function/call.py", line 77, in call
2022-08-25T21:51:17.653297+00:00  INFO     raise TypeError(f"unexpected keyword argument '{extra_args[0]}'")
2022-08-25T21:51:17.653347+00:00  INFO     TypeError: unexpected keyword argument 'q3'

I know that this is not an issue with ensembleperturbation's SCHISM implementation, since when I try a standalone script (as opposed to using Docker in ECS on my system) it works, but in any case I thought you might have seen this issue before and might be able to help me figure out what is going on in my Dockerfile

And by the way whenever you give me an OK I can merge this pull request as well. Thanks!

WPringle commented 2 years ago

@WPringle I'm finalizing and testing the schism ensemble support branch on my workflow system and I ran into an error like this:

2022-08-25T21:51:17.652842+00:00  INFO     File "/home/ondemand-user/icogsc/lib/python3.9/site-packages/ensembleperturbation/uncertainty_quantification/surrogate.py", line 238, in sensitivities_from_surrogate
2022-08-25T21:51:17.652893+00:00  INFO     chaospy.Sens_m(surrogate_model, distribution),
2022-08-25T21:51:17.652944+00:00  INFO     File "/home/ondemand-user/icogsc/lib/python3.9/site-packages/chaospy/descriptives/sensitivity/main.py", line 43, in Sens_m
2022-08-25T21:51:17.652994+00:00  INFO     conditional = E_cond(poly[valids], unit_vec, dist, **kws)
2022-08-25T21:51:17.653045+00:00  INFO     File "/home/ondemand-user/icogsc/lib/python3.9/site-packages/chaospy/descriptives/conditional.py", line 70, in E_cond
2022-08-25T21:51:17.653095+00:00  INFO     frozen = poly(**{("q%d" % idx): 1 for idx, keep in enumerate(freeze) if not keep})
2022-08-25T21:51:17.653146+00:00  INFO     File "/home/ondemand-user/icogsc/lib/python3.9/site-packages/numpoly/baseclass.py", line 596, in __call__
2022-08-25T21:51:17.653197+00:00  INFO     return numpoly.call(self, args, kwargs)
2022-08-25T21:51:17.653247+00:00  INFO     File "/home/ondemand-user/icogsc/lib/python3.9/site-packages/numpoly/poly_function/call.py", line 77, in call
2022-08-25T21:51:17.653297+00:00  INFO     raise TypeError(f"unexpected keyword argument '{extra_args[0]}'")
2022-08-25T21:51:17.653347+00:00  INFO     TypeError: unexpected keyword argument 'q3'

I know that this is not an issue with ensembleperturbation's SCHISM implementation, since when I try a standalone script (as opposed to using Docker in ECS on my system) it works, but in any case I thought you might have seen this issue before and might be able to help me figure out what is going on in my Dockerfile

It kind of looks like chaospy is having a trouble calling numpoly function. May need to make sure the versions of those packages versions are compatible?

And by the way whenever you give me an OK I can merge this pull request as well. Thanks!

I took a look at the google drive just now and looks good! So please merge the PR.

SorooshMani-NOAA commented 2 years ago

@WPringle, thanks for catching the typo.

About the package compatibility: the strange thing is if I use the exact same docker image on AWS ECS service vs manually running it, I get two different results! So I don't think it's the packages, I should probably spend some more time on that issue then.