ktindiana / sphinxval

SPHINX validation code for solar energetic particle models
MIT License
3 stars 3 forks source link

SPHINX_run log file: invalid forecast reason and filename on same line #144

Open claya71 opened 2 weeks ago

claya71 commented 2 weeks ago

Items of feedback for the log files to help understanding/ease of reading/finding useful information.

  1. When removing invalid forecasts, the statement stating which file is being removed and the statement of the reason why its being removed are on separate lines, would be better to be on the same line. Ex: 2024-09-27 12:23:19 - [WARNING] classes.valid_forecast: Trigger and input timing data not available. 2024-09-27 12:23:19 - [WARNING] match.setup_match_all_forecasts: FORECAST NOT VALID: Removing /home/m_sphinx/data/forecasts/iswa_data_tree/model/heliosphere/sep_scoreboard/SAWS_ASPECS/1.X/Forecasts/Intensity/2024/02/SAWS_ASPECS_0_06_hrs_50.2024-02-22T120500Z.2024-02-22T123002Z.json from validation analysis.
  2. We want to know out of all of the forecasts (read number of sphinx objects total created across all energy channels) that are read in, which percentage/fraction of those forecasts were correctly matched to an observation. Since a single json file can contain multiple forecasts (equal to the number of energy channels in that json), the number of jsons read in do not equate to the total number of sphinx objects created. Would need to add a way to print out number of energy channels in a json and how many of those are matched to an observation.
rickyegeland commented 1 week ago

Note that item 2 can be computed currently across all models and energy channels, or per energy channel with the following log lines:

(example from 202408 monthly processing)

2024-10-02 00:43:09 - [INFO] validation_json_handler.load_objects_from_json: STATS: Forecast objects created for : min.10.0.max.-1.0.units.MeV, 30809
2024-10-02 00:43:09 - [INFO] validation_json_handler.load_objects_from_json: STATS: Forecast objects created for : min.100.0.max.-1.0.units.MeV, 25060
2024-10-02 00:43:09 - [INFO] validation_json_handler.load_objects_from_json: STATS: Forecast objects created for : min.5.0.max.-1.0.units.MeV, 0
2024-10-02 00:43:09 - [INFO] validation_json_handler.load_objects_from_json: STATS: Forecast objects created for : min.30.0.max.-1.0.units.MeV, 24893
2024-10-02 00:43:09 - [INFO] validation_json_handler.load_objects_from_json: STATS: Forecast objects created for : min.50.0.max.-1.0.units.MeV, 17044
2024-10-02 00:43:09 - [INFO] validation_json_handler.load_objects_from_json: STATS: Forecast objects created for : min.10.0.max.-1.0.units.MeV_min.5.0.max.-1.0.units.MeV, 0
2024-10-02 01:25:13 - [INFO] match.setup_match_all_forecasts: STATS: TOTAL SPHINX OBJECTS: 83571

In this example:

>>> N_fcast = 30809 + 25060 + 24893 + 17044
>>> N_fcast
97806
>>> N_sphinx = 83571
>>> N_sphinx / N_fcast
0.854456781792528

Similar computations can be made per energy channel by summing up the lines like this:

2024-10-02 01:25:13 - [INFO] match.setup_match_all_forecasts: STATS: SPHINX objects created for GSU All Clear, min.10.0.max.-1.0.units.MeV: 704

across the energy channels. But there is currently no way to obtain a fraction per model. I suggest that we do not solve this by adding more log lines, instead let's create a summary data file that has exactly what we want in an easy-to-use data format.

rickyegeland commented 1 week ago

Created new issue #145 to encapsulate what is desired for item 2.