noaa-ocs-modeling / EnsemblePerturbation

perturbation of coupled model input over a space of input variables
https://ensembleperturbation.readthedocs.io
Creative Commons Zero v1.0 Universal
7 stars 3 forks source link

Add the ability to select forecast time for forecast advisories in deck `a` #98

Closed SorooshMani-NOAA closed 1 year ago

SorooshMani-NOAA commented 1 year ago

The following issue will be addressed with this pull request:

image

image

SorooshMani-NOAA commented 1 year ago

This is the result after the fix:

image

image

SorooshMani-NOAA commented 1 year ago

Note the difference with the previous plots. Here the start_date and forecast_time are the same. start_date specifies the time to start perturbing, and forecast_time specifies what advisory to look at.

image image

WPringle commented 1 year ago

@SorooshMani-NOAA I was going to ask about that difference between start_date and forecast_time. I don't understand what is meant by the time to start perturbing at, especially if start_date is backwards in time from forecast_time.

SorooshMani-NOAA commented 1 year ago

@WPringle I think we need to discuss how we want it to work. start_date and end_date are filters in stormevents. The same thing is true for forecast_time. The difference is that the first two filters based on forecasted time and the last one based on track start time. Now I just added forecast_time to ensembleperturbation by treating them like the other two, just as I did for stormevents, but I didn't really realize the difference it can make in perturbations. I need to look deeper into the use of start_date for the perturbation, maybe we just need to ensure that start_date is always equal to forecast_time whereever specified. What do you think?

Let me also play a little bit with the update and see if I find any new aspect of it.

SorooshMani-NOAA commented 1 year ago

@WPringle I took another look at the code, and I can summarize the impact of each date/time as follows.

As discussed previous comments, in stormevents.nhc.VortexTrack:

In ensembleperturbation these date/times are passed to the stormevents.nhc.VortexTrack object as is, so that the unperturbed track is filtered as described above. However, in addition to that the start_date is used to calculate the validation_times, taken to be self.track.data['datetime'] - self.track.start_date, in other words the difference between datetime column and the specified input start_time.

With this description maybe from ensembleperturbation point of view it might make sense to just get the start_date as input and then use it as both start_date and forecast_time when passing to VortexTrack from stormevents to get the unperturbed track. What do you think?

Update I also see a usecase for a perturb_time input so that the perturbation doesn't start at the beginning of track. But that is not relevant directly to this current fix, just something to keep in mind as we think about adding arguments.

WPringle commented 1 year ago

@SorooshMani-NOAA Thanks for the thorough investigation! I think that it would make sense that the validation_times are computed from forecast_time not start_date, as that is the intended meaning. This would mean start_date and end_date could be used to filter a portion of the forecast from forecast_time but would not be a necessary input if we just want the whole forecast.

Although I could see a use case of end_date if we don't want to use all 168 forecast hours I can hardly think of a good use case of start_date not being equal to forecast_time, so I agree that the suggestion to just use start_date equal to forecast_time could be a good option. Then I suppose we don't actually need the forecast_time kwarg.

I think these two paragraphs are a little in conflict, so I just agree that keep start_date and end_date and use start_date to compute validation_times and then all we need to do is just set forecast_time = start_date when getting the unperturbed track.

SorooshMani-NOAA commented 1 year ago

@WPringle I made the updates we discussed above, just one last thing. Should we still filter if no start_date is passed? I think we should, but I just wanted to check with you to see if you can think of a test case where we shouldn't filter if it's not passed.

Not filtering results in the same issue we originally opened this ticket for

WPringle commented 1 year ago

@SorooshMani-NOAA if no start_date is passed then how would it know which advisory (forecast_time) to use?

WPringle commented 1 year ago

@SorooshMani-NOAA if no start_date is passed then how would it know which advisory (forecast_time) to use?

I see, so I see you implemented the first possible start_date. I guess this is fine, yeah. Or raise an error saying cannot do advisory a deck without start_date specified.

SorooshMani-NOAA commented 1 year ago

@WPringle I just noticed two major questions/issues with advisories here:

  1. How should modified advisory files be handled in stormevents and ensembleperturbation? In the test_advisory_ensemble.py test we use a modified advisory file (florence_advisory.22) where the date tags for each forecast hour is different:

    AL, 06, 2018091218,   , OFCL,   0, 304N,  719W, 110,  943, HU,  34, NEQ,  170,  140,  100,  140, 1013,    0,  15,     ,    ,    ,    ,    ,  0,   0,            ,   1
    AL, 06, 2018091218,   , OFCL,   0, 304N,  719W, 110,  943, HU,  50, NEQ,  100,   80,   60,   80, 1013,    0,  15,     ,    ,    ,    ,    ,  0,   0,            ,   1
    AL, 06, 2018091218,   , OFCL,   0, 304N,  719W, 110,  943, HU,  64, NEQ,   60,   60,   40,   60, 1013,    0,  15,     ,    ,    ,    ,    ,  0,   0,            ,   1
    AL, 06, 2018091221,   , OFCL,   3, 309N,  725W, 105,  949, HU,  34, NEQ,  170,  140,  100,  140, 1013,    0,  15,     ,    ,    ,    ,    ,314,   7,            ,   2
    AL, 06, 2018091221,   , OFCL,   3, 309N,  725W, 105,  949, HU,  50, NEQ,  100,   80,   60,   80, 1013,    0,  15,     ,    ,    ,    ,    ,314,   7,            ,   2
    AL, 06, 2018091221,   , OFCL,   3, 309N,  725W, 105,  949, HU,  64, NEQ,   60,   60,   40,   60, 1013,    0,  15,     ,    ,    ,    ,    ,314,   7,            ,   2
    AL, 06, 2018091306,   , OFCL,  12, 321N,  741W, 110,  943, HU,  34, NEQ,  170,  140,  100,  140, 1013,    0,  15,     ,    ,    ,    ,    ,312,   6,            ,   3
    AL, 06, 2018091306,   , OFCL,  12, 321N,  741W, 110,  943, HU,  50, NEQ,  100,   80,   60,   80, 1013,    0,  15,     ,    ,    ,    ,    ,312,   6,            ,   3
    AL, 06, 2018091306,   , OFCL,  12, 321N,  741W, 110,  943, HU,  64, NEQ,   60,   60,   40,   60, 1013,    0,  15,     ,    ,    ,    ,    ,312,   6,            ,   3
    AL, 06, 2018091318,   , OFCL,  24, 334N,  759W, 110,  943, HU,  34, NEQ,  170,  140,  100,  140, 1013,    0,  15,     ,    ,    ,    ,    ,311,   5,            ,   4
    AL, 06, 2018091318,   , OFCL,  24, 334N,  759W, 110,  943, HU,  50, NEQ,  100,   80,   60,   80, 1013,    0,  15,     ,    ,    ,    ,    ,311,   5,            ,   4
    AL, 06, 2018091318,   , OFCL,  24, 334N,  759W, 110,  943, HU,  64, NEQ,   60,   60,   40,   60, 1013,    0,  15,     ,    ,    ,    ,    ,311,   5,            ,   4
    ...

    where as In the original ATCF file, all of them have the same datetag (i.e. track_start_time) for each advisory issued. Most of the logic in stormevents revolves around handling data downloaded from NHC, now the question is should we handle the modified ones (like the one above)? Ideally we want to be able to handle the modified files, so how about assuming .22 is modified and .dat is unmodified?

  2. When we output from an original (non-modified) advisory file (e.g. OFCL from Florence downloaded from NHC server) to a .22 format, we don't modify the date tags. For both SCHISM and ADCIRC, the date tags should be track_start_time + forecast_hours, just like what we see for BEST. This basically means we need to generate the modified advisory discussed in item 1. So we can follow the same suggested logic above and say for .22 files we output date tags that are modified (to comply with SCHISM and ADCIRC), but for .dat we output datetags based on NHC convention.

This is a bit of a larger undertaking that I originally thought, but it is required in order to be able to use stormevents and ensembleperturbation for setting up non-BEST track simulations.

As a side note, what I did in ondemand-storm-workflow was to fake the tracks as being BEST so that I don't run into the issues described above.

WPringle commented 1 year ago

@WPringle I just noticed two major questions/issues with advisories here:

  1. How should modified advisory files be handled in stormevents and ensembleperturbation? In the test_advisory_ensemble.py test we use a modified advisory file (florence_advisory.22) where the date tags for each forecast hour is different:
AL, 06, 2018091218,   , OFCL,   0, 304N,  719W, 110,  943, HU,  34, NEQ,  170,  140,  100,  140, 1013,    0,  15,     ,    ,    ,    ,    ,  0,   0,            ,   1
AL, 06, 2018091218,   , OFCL,   0, 304N,  719W, 110,  943, HU,  50, NEQ,  100,   80,   60,   80, 1013,    0,  15,     ,    ,    ,    ,    ,  0,   0,            ,   1
AL, 06, 2018091218,   , OFCL,   0, 304N,  719W, 110,  943, HU,  64, NEQ,   60,   60,   40,   60, 1013,    0,  15,     ,    ,    ,    ,    ,  0,   0,            ,   1
AL, 06, 2018091221,   , OFCL,   3, 309N,  725W, 105,  949, HU,  34, NEQ,  170,  140,  100,  140, 1013,    0,  15,     ,    ,    ,    ,    ,314,   7,            ,   2
AL, 06, 2018091221,   , OFCL,   3, 309N,  725W, 105,  949, HU,  50, NEQ,  100,   80,   60,   80, 1013,    0,  15,     ,    ,    ,    ,    ,314,   7,            ,   2
AL, 06, 2018091221,   , OFCL,   3, 309N,  725W, 105,  949, HU,  64, NEQ,   60,   60,   40,   60, 1013,    0,  15,     ,    ,    ,    ,    ,314,   7,            ,   2
AL, 06, 2018091306,   , OFCL,  12, 321N,  741W, 110,  943, HU,  34, NEQ,  170,  140,  100,  140, 1013,    0,  15,     ,    ,    ,    ,    ,312,   6,            ,   3
AL, 06, 2018091306,   , OFCL,  12, 321N,  741W, 110,  943, HU,  50, NEQ,  100,   80,   60,   80, 1013,    0,  15,     ,    ,    ,    ,    ,312,   6,            ,   3
AL, 06, 2018091306,   , OFCL,  12, 321N,  741W, 110,  943, HU,  64, NEQ,   60,   60,   40,   60, 1013,    0,  15,     ,    ,    ,    ,    ,312,   6,            ,   3
AL, 06, 2018091318,   , OFCL,  24, 334N,  759W, 110,  943, HU,  34, NEQ,  170,  140,  100,  140, 1013,    0,  15,     ,    ,    ,    ,    ,311,   5,            ,   4
AL, 06, 2018091318,   , OFCL,  24, 334N,  759W, 110,  943, HU,  50, NEQ,  100,   80,   60,   80, 1013,    0,  15,     ,    ,    ,    ,    ,311,   5,            ,   4
AL, 06, 2018091318,   , OFCL,  24, 334N,  759W, 110,  943, HU,  64, NEQ,   60,   60,   40,   60, 1013,    0,  15,     ,    ,    ,    ,    ,311,   5,            ,   4
...

where as In the original ATCF file, all of them have the same datetag (i.e. track_start_time) for each advisory issued. Most of the logic in stormevents revolves around handling data downloaded from NHC, now the question is should we handle the modified ones (like the one above)? Ideally we want to be able to handle the modified files, so how about assuming .22 is modified and .dat is unmodified?

  1. When we output from an original (non-modified) advisory file (e.g. OFCL from Florence downloaded from NHC server) to a .22 format, we don't modify the date tags. For both SCHISM and ADCIRC, the date tags should be track_start_time + forecast_hours, just like what we see for BEST. This basically means we need to generate the modified advisory discussed in item 1. So we can follow the same suggested logic above and say for .22 files we output date tags that are modified (to comply with SCHISM and ADCIRC), but for .dat we output datetags based on NHC convention.

This is a bit of a larger undertaking that I originally thought, but it is required in order to be able to use stormevents and ensembleperturbation for setting up non-BEST track simulations.

As a side note, what I did in ondemand-storm-workflow was to fake the tracks as being BEST so that I don't run into the issues described above.

If both SCHISM and ADCIRC require the dates to be updated then I guess it does make sense to have the fort.22 be the modified versions for compliance and the .dat for the original style. Actually originally there is code inside ADCIRC Holland that should handle either BEST or OFCL in these different ways. Then with new holland model GAHM it only reads the forecast_hours which is why I made that other PR for the best-track.

SorooshMani-NOAA commented 1 year ago

@WPringle thanks for testing. About the modified vs original format:

if both SCHISM and ADCIRC require the dates to be updated then I guess it does make sense to have the fort.22

I cannot speak for ADCIRC, but in SCHISM (PaHM) it seems that it doesn't care about either forecast_hours or OFCL vs BEST it just reads the date tags. That's why I want to make the distinction to make it manageable. So in ADCIRC, if we have OFCL but we also have differing date tags for different forecast hours within the same advisory (like BEST) is it going to be a problem?

If it's a problem, we might need to make the formatting more explicit, e.g. argument for what to output, or something like that.

WPringle commented 1 year ago

@SorooshMani-NOAA It may make sense to have a .22 option being for direct use in ADCIRC and .pahm file or similar to be compliant with PaHM? Then we can be clear about which is compliant with which.

SorooshMani-NOAA commented 1 year ago

That makes sense. So should I leave .22 format as is and just update thing for .pahm output?

WPringle commented 1 year ago

@SorooshMani-NOAA Yeah if you think so. Does PaHM require a certain file suffix or can be any such as .pahm?

SorooshMani-NOAA commented 1 year ago

No, the input file for PaHM needs to be specified in a separate file, in the PaHM-SCHISM case it is hardcoded to be hurricane-track.dat. But .dat is our original format which we don't want to touch.

By the way, I still need to modify the code to handle reading of .dat vs .22 differently. So let's take a step back. In ADCIRC, when we have a modified track file, should the date tags be increasing (like BEST) or it doesn't matter? Because without changing anything, right now we're running into issues (the currently failing test) with a modified .22 track file. So I have to add some code to handle different .22 cases as well (in terms of date tags, etc.)

WPringle commented 1 year ago

@SorooshMani-NOAA The date tags should be increasing for BEST and the date tags should be constant for OFCL but with the forecast_hours column filled out.

SorooshMani-NOAA commented 1 year ago

OK, so the advisory track that we have for this test is invalid then, right? https://github.com/noaa-ocs-modeling/EnsemblePerturbation/blob/13b58bb49af2c9b5a7de4775438695b9402cd656/tests/data/input/test_existing_advisory/florence_advisory.22#L1-L22

because we have both the increasing date/time tag as well as the forecast hours.

WPringle commented 1 year ago

@SorooshMani-NOAA That's true it is invalid.

SorooshMani-NOAA commented 1 year ago

OK great! I think I can fix the file here and then merge this PR, then move back to stormevents to add formatting needed for .pahm, finally I'll modify ensembleperturbation in a separate PR to replace .22 output for perturbations with .pahm or something like that

codecov-commenter commented 1 year ago

Codecov Report

Attention: Patch coverage is 96.49123% with 2 lines in your changes missing coverage. Please review.

Project coverage is 21.22%. Comparing base (13b58bb) to head (c67d045). Report is 21 commits behind head on main.

Files Patch % Lines
ensembleperturbation/plotting/perturbation.py 0.00% 2 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #98 +/- ## ========================================== + Coverage 19.57% 21.22% +1.65% ========================================== Files 27 28 +1 Lines 3703 3759 +56 ========================================== + Hits 725 798 +73 + Misses 2978 2961 -17 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.