ktindiana / sphinxval

SPHINX validation code for solar energetic particle models
MIT License
3 stars 3 forks source link

Annotate forecast recent events for post-process filtering #58

Open rickyegeland opened 4 months ago

rickyegeland commented 4 months ago

The UMASEP model is essentially a triggered nowcasting model in that it will only produce a positive forecast in response to either a sufficiently strong solar flare or a detected rise in proton flux. Nonetheless, it issues a forecast at a regular (e.g. 3 min) cadence, which naturally results in an enormous number of true negative (TN) in the (inverted) All Clear contingency table. This renders all metrics that include TN mostly useless, as the statistic is dominated by forecasts in which the only evaluation the model has done is that conditions are quiet.

A more useful TN count could be obtained if only forecasts that should have been the result of a model trigger are considered, i.e. only select the UMASEP forecasts that were derived from a solar flare of sufficient magnitude, or by the onset of a SEP event (or sub-threshold non-event). This would allow metrics such as Specificity {TN/(TN + FP), P(Forecast:No|Event:No)}, False Alarm Rate {FP/(TN + FP), P(Forecast:Yes|Event:No)}, Accuracy (TP / N), and Rate Correct ( (TP + TN) / N ) to have their intended meaning.

Since the details of the nature of a forecast (triggered vs. quiet) are not stored, we can only select the "should have triggered" forecasts by pre-processing the X-ray and proton time series and extracting timestamps. Many groups have done this, and we should leverage existing flare lists.

To incorporate the lists in the analysis, SPHINX outputs should include a kind of per-forecast "recent events" datum. For every forecast issue time, look into a 2 hour window preceeding the issue time and record:

  1. pre_max_flare_class : largest flare flux (in GOES class notation)
  2. pre_max_flare_time : largest flare peak time
  3. pre_max_gt10_flux : max proton flux (per integral flux channel – 10, 30, 50, 100)
  4. in_onset : (boolean) True if issue_time is after SEP event start time and before onset max time

These annotated quantities can be used to make post-processing selections (e.g. pre_max_flare_class >= C5) to pare down the contingency tables and make the TN statistic take on new meaning.

rickyegeland commented 4 months ago

Note that this feature request is designed to implement Steve's requested analysis from his email of Aug 28, 2023. I copy it below for reference.

With all the push to be ready for SEPVAL I almost hate to mention this because I don’t want to detract from the effort. But if I don’t write while I’m thinking about it, I’ll forget. Forgive me if I have this wrong.

I was troubled by the discussion of scoring UMASEP a couple weeks back. If I understood, the number of forecast events was the cadence the model is polled or posts a forecast, which I believe is 3 minutes. This creates an incredible number of forecast cases. (175 kforecasts/year at 1per3 minutes). Positive forecasts are very few. This make the forecast matrix Actual Forecast Yes No Yes 5 No 175 K * several years


It would seem to be better to score it based on when it should be triggered, which is principally C5 flares or larger. This would make the number of forecast events: Total events = Number of C5

  • number of non-C5 instances it should trigger
    • number of Non-C5 triggers that produced a forecast.

Maybe it still is a large number but would produce a better representation of performance.

In general, grading on the number of forecasts posted (X times a day) * (days) as the number of forecasts, seems excessive when there are specific circumstances to trigger a model’s forecast. Maybe a framework more like below would make more sense

the number of times it should have triggered + the number of times it triggered when it shouldn’t have

Rather than every forecast that a model posts (especially for models that post forecasts at high cadence). Daily forecasts would use the number of days for the forecast (e.g. MAG4).

I think this is a little different than we were discussing and thought I would bring it up for consideration.

Thanks

SJ