reichlab / zoltr

http://reichlab.io/zoltr/
GNU General Public License v3.0
2 stars 4 forks source link

False error message: non-decreasing values as quantiles increase #49

Closed lshandross closed 1 year ago

lshandross commented 1 year ago

When uploading forecasts in which quantile values are small (e.g. 1.75e-05), zoltar throws the error message "Entries in value must be non-decreasing as quantiles increase" even when this is not true.

For example, see the 2022-11-14 forecast for the VTSanghani-Transformer model, original forecast CSV attached. Attempting to upload a transformed JSON file of the forecasts to zoltar (see code below) results in a job output listed here. Both the original CSV file and transformed JSON file have the correct values which are non-decreasing as the quantiles increase, in spite of the error message thrown by zoltar.

# note that the wd for this code is a local copy of the Flusight-forecast-data repo
source("../cdc.R") # this script is attached as a .txt file since RScripts aren't allowed
model_name <- "VTSanghani-Transformer"
forecast_date <- "2022-11-14"
raw_csv_path <- paste("data-forecasts/", model_name, "/", forecast_date, "-", model_name, ".csv", sep="")
raw_forecasts <- read_csv(raw_csv_path)
forecast_data <- forecast_data_from_flusight_csv_file(raw_csv_path)

# upload forecasts
if (model_name %in% the_models$model_abbr) {
  model_url <- the_models[the_models$model_abbr == model_name, "url"]
  upload_forecast(
    zoltar_connection,
    model_url,
    timezero_date = forecast_date,
    forecast_data = forecast_data,
    notes = ""
  )
} else { stop("Please provide the name of an existing model.")}

Note that in spite of the error message shown above this forecast passed a sanity check validation on the original GitHub repo here, which I would think includes ensuring there are no decreasing values for increasing quantiles

2022-11-14-VTSanghani-Transformer.csv cdc.txt

matthewcornell commented 1 year ago

The validation program _validate_quantile_prediction_dict() uses the helper function _le_with_tolerance() to compare a zipped offset of the incoming values:

def _validate_quantile_prediction_dict(prediction_dict, target):
    ...
    is_le_values = [_le_with_tolerance(a, b) for a, b in zip(pred_data_values, pred_data_values[1:])]
    if not all(is_le_values):
        raise RuntimeError(f"Entries in `value` must be non-decreasing as quantiles increase. "
                           f"value column={pred_data_values}, is_le_values={is_le_values}, "
                           f"prediction_dict={prediction_dict}")
def _le_with_tolerance(a, b):  # a <= b ?
    if type(a) in {int, float}:
        return True if math.isclose(a, b, rel_tol=1e-05) else a <= b  # default: rel_tol=1e-09
    else:  # date
        return a <= b

I don't recall the rationale here, but it might have been motivated by the bin validation done by _validate_bin_prediction_dict():

BIN_SUM_REL_TOL = 0.001  # hard-coded magic number for prediction probability sums

def _validate_bin_prediction_dict(is_validate_cats, prediction_dict, target):
    ...
    prob_sum = sum(prediction_data['prob'])
    if not math.isclose(1.0, prob_sum, rel_tol=BIN_SUM_REL_TOL):
        raise RuntimeError(f"For one prediction element, the values within prob must sum to 1.0. "
                           f"prob_sum={prob_sum}, delta={abs(1 - prob_sum)}, rel_tol={BIN_SUM_REL_TOL}, "
                           f"prediction_dict={prediction_dict}")
nickreich commented 1 year ago

@matthewcornell I do think that having the tolerance for the sum of binned probabilities is important.

That said, I think we could remove it from the quantile validation. The quantiles, if sorted by the quantile level, should be strictly increasing.

matthewcornell commented 1 year ago

Here's the JSON dict that caused the error. Note that the value 4.3765e-06 is bracketed by zeros. I could not find this case in the csv file attached above.

{'unit': '02', 
 'target': '2 wk ahead inc flu hosp', 
 'class': 'quantile',
 'prediction': {
  'quantile': [0.01, 0.025, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 0.975, 0.99], 
  'value': [0, 0, 0, 0, 0, 0, 0, 0, 4.3765e-06, 0, 0, 0, 1.0429, 2.0857, 3.1285, 6.7331, 11.6186, 16.504, 21.4286, 26.4316, 31.4345, 33.936, 35.4368]}
}
lshandross commented 1 year ago

It seems like zoltar's quantile validation changed some of the values (like 1.75e-05) to zero, which is what caused the error.

I'm also in support of removing this bit of code from the quantile validation since it's redundant because the GitHub actions in the original CDC repo where we're pulling forecasts from already checks to make sure there are no decreasing values for increasing quantiles.

matthewcornell commented 1 year ago

Zoltar does not explicitly change any input values - they're parsed in Python via JSON.

matthewcornell commented 1 year ago

Here's the csv subset that contains 4.376486685941912e-06 , which I agree has been transformed by the upload process:

2022-11-14,2 wk ahead inc flu hosp,2022-11-26,02,point,NA,8.524941444396973
2022-11-14,2 wk ahead inc flu hosp,2022-11-26,02,quantile,0.010,0.0
2022-11-14,2 wk ahead inc flu hosp,2022-11-26,02,quantile,0.025,0.0
2022-11-14,2 wk ahead inc flu hosp,2022-11-26,02,quantile,0.050,0.0
2022-11-14,2 wk ahead inc flu hosp,2022-11-26,02,quantile,0.100,0.0
2022-11-14,2 wk ahead inc flu hosp,2022-11-26,02,quantile,0.150,0.0
2022-11-14,2 wk ahead inc flu hosp,2022-11-26,02,quantile,0.200,0.0
2022-11-14,2 wk ahead inc flu hosp,2022-11-26,02,quantile,0.250,0.0
2022-11-14,2 wk ahead inc flu hosp,2022-11-26,02,quantile,0.300,0.0
2022-11-14,2 wk ahead inc flu hosp,2022-11-26,02,quantile,0.350,4.376486685941912e-06
2022-11-14,2 wk ahead inc flu hosp,2022-11-26,02,quantile,0.400,1.7505946743767725e-05
2022-11-14,2 wk ahead inc flu hosp,2022-11-26,02,quantile,0.450,3.06354068015935e-05
2022-11-14,2 wk ahead inc flu hosp,2022-11-26,02,quantile,0.500,4.376486685941927e-05
2022-11-14,2 wk ahead inc flu hosp,2022-11-26,02,quantile,0.550,1.0428736837282495
2022-11-14,2 wk ahead inc flu hosp,2022-11-26,02,quantile,0.600,2.0857037067413318
2022-11-14,2 wk ahead inc flu hosp,2022-11-26,02,quantile,0.650,3.1285336256027234
2022-11-14,2 wk ahead inc flu hosp,2022-11-26,02,quantile,0.700,6.733109617233264
2022-11-14,2 wk ahead inc flu hosp,2022-11-26,02,quantile,0.750,11.618557929992676
2022-11-14,2 wk ahead inc flu hosp,2022-11-26,02,quantile,0.800,16.504006958007825
2022-11-14,2 wk ahead inc flu hosp,2022-11-26,02,quantile,0.850,21.42861881256103
2022-11-14,2 wk ahead inc flu hosp,2022-11-26,02,quantile,0.900,26.431556320190435
2022-11-14,2 wk ahead inc flu hosp,2022-11-26,02,quantile,0.950,31.434493827819814
2022-11-14,2 wk ahead inc flu hosp,2022-11-26,02,quantile,0.975,33.93596258163451
2022-11-14,2 wk ahead inc flu hosp,2022-11-26,02,quantile,0.990,35.436843833923334

I converted this by hand into a Python dict:

{"unit": "loc2", "target": "pct next week", "class": "quantile",
 "prediction": {
     "quantile": [0.010, 0.025, 0.050, 0.100, 0.150, 0.200, 0.250, 0.300, 0.350, 0.400, 0.450,
                  0.500, 0.550, 0.600, 0.650, 0.700, 0.750, 0.800, 0.850, 0.900, 0.950, 0.975,
                  0.990],
     "value": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.376486685941912e-06,
               1.7505946743767725e-05, 3.06354068015935e-05, 4.376486685941927e-05,
               1.0428736837282495, 2.0857037067413318, 3.1285336256027234, 6.733109617233264,
               11.618557929992676, 16.504006958007825, 21.42861881256103, 26.431556320190435,
               31.434493827819814, 33.93596258163451, 35.436843833923334]}
 }

When I run this through the validation code, it does not error.

nickreich commented 1 year ago

Ok, so is the conclusion that there is an error in the (zoltr or zoltpy?) code that uploads? Clearly somewhere raw data is being transformed in an undesirable way, right?

matthewcornell commented 1 year ago

I figured out the problem. It's on the zoltr (client) side, and results from a "feature" of jsonlite::write_json() where it zeros out some small numbers: [Inconsistent treatment of digits across 1e-05 reichlab/forecast-repository#184] . Adding digits = NA to zoltr::upload_forecast() fixes it.