nickreich commented 1 year ago

E.g. instead of the mode that predtimechart uses now to show forecasts from a single timezero or forecast_date, we could code up a different chart where you select a target (or "outcome"?) and a time-frame and i guess a location and then all predictions that have been made over time for those selections are shown.

nickreich commented 1 year ago

E.g. here is a figure showing predictions (points) for different models (colors) from different timezeroes, this is from the NBA predictions project in zoltar, plotted in R. In some ways, this is a simpler kind of chart than the existing main plotting feature in predtimechart because there isn't any of the inferring of dates that needs to happen to get x-axes aligned that goes on in the current plot feature. I'm imagining that just one of these plots would be shown in resolving this issue, but the faceting (multiple embedded plots) is an example of what is requested in #16 .

matthewcornell commented 1 year ago

I think this would require a new API and a new chart type (e.g., a tab or separate URL).

Question: What would the API look like for downloading necessary data? (Specifically the function call arguments and the returned data format.)

matthewcornell commented 1 year ago

@nickreich Q: The example image is for a particular project target, correct? The targets from the NBA Zoltar project are: "make playoffs", "season wins", and "win finals", with only the middle one being non-binary.

nickreich commented 1 year ago

@matthewcornell Yes, the idea is that you would pick one particular target. Predictions for that target are shown on y-axis, with timezero where the forecast was made on the x-axis.

matthewcornell commented 1 year ago

@nickreich : Q: I'm trying to understand how you've gone from zoltar data to the example output you gave. I tried this query:

{'units': ['bos', 'den'],
 'models': ['538-Elo', '538-RAPTOR'],
 'targets': ['make playoffs']}

which returned 1309 rows of bin data, e.g.,

model   timezero    unit    target      class   cat prob
538-Elo 2022-10-13  bos make playoffs   bin TRUE    0.959
538-Elo 2022-10-13  bos make playoffs   bin FALSE   0.041
538-Elo 2022-10-14  bos make playoffs   bin TRUE    0.96
538-Elo 2022-10-14  bos make playoffs   bin FALSE   0.04
...

Q: Do you have time to explain this to me?

Note: I'm using a simplified query while trying to understand this. Ultimately we'll need to specify a list of timezeros from the relevant period (the entire NBA season, i.e., 2022–23 NBA season: October 18, 2022, - April 9, 2023?) as well as types (?)

nickreich commented 1 year ago

Basically, this looks like output that is nearly ready to be plotted. The one thing is that for a prediction element with bin class, we only need to plot one row (generally the cat=TRUE row), as the prob value for the cat=FALSE row will be 1-prob in the TRUE row. For the rows above, I'd imagine a plot that used timezero as the x-axis and prob as the y-axis, after subsetting to only include the cat=TRUE rows.

matthewcornell commented 1 year ago

Thanks, Nick. So we will need rules for translating between data classes (bin, named, point, sample, and quantile) to y-axis data for the charts.

nickreich commented 1 year ago

Yes. For point it's straight forward (just the value), and for bin I've explained it above. could we start by implementing just those two?

matthewcornell commented 1 year ago

Yep!

nickreich commented 1 year ago

do we have a sample--> point conversion function? this is something we've talked about before, right?

matthewcornell commented 1 year ago

According to our Automatic prediction type conversion docs:

Currently, only these combinations are implemented:

target types: continuous, discrete

conversions: point <- sample (uses either statistics.mean() or statistics.median() depending on the convert.point option)

conversions: quantile <- sample (uses numpy.quantile())

matthewcornell commented 1 year ago

Multiple forecasts over time: API

Following is example API input, intermediate Zoltar query, and API output for the Zoltar NBA project.

workflow

UI: user specifies inputs and submits to server
Zoltar: translates request input to forecast query, executes (waiting), and then converts resulting csv rows (applying rules [1]) to JSON to return
UI: plots returned data

[1] Rules to convert rows to plottable data:

for a prediction element with bin class, we only need to plot one row (generally the cat=TRUE row)
point is straightforward (just the value)

input: a JSON object from the UI specifying what data is to be plotted

The only difference between this and a Zoltar forecast query is the dates start/end range, rather than a list of timezeros.

{
 "project_id": 316,
 "units":     ["bos", "den"],
 "models":    ["538-Elo", "538-RAPTOR"],
 "dates":     ["2022-10-18", "2023-04-09"],
 "target":    "win finals"
}

corresponding intermediate Zoltar query

Here I'm simulating the expansion of dates to a small set of timezeros. We might want to restrict to bin and point prediction types.

{"units": ["bos", "den"],
 "targets": ["win finals"],
 "timezeros": ["2022-10-19", "2022-10-21", "2022-10-24", "2022-10-28", "2022-10-31", "2022-11-01", "2022-11-02", "2022-11-03", "2022-11-04", "2022-11-05", "2022-11-06", "2022-11-07", "2022-11-13", "2022-11-14", "2022-11-16", "2022-11-17", "2022-11-18", "2022-11-19", "2022-11-20", "2022-11-21"],
 "models": ["538-Elo", "538-RAPTOR"]
}

Behind the scenes this results in num_rows=32 .

output: a JSON object containing the data to plot

This format is similar to that of predtimechart's forecast and truth data formats. The object has one key per unit, where each unit becomes its own subplot/facet. Each key's value is an object that has one key per model. Each model has x/y data (date and y) that's used to plot one trace on the subplot.

{
    "bos": {
        "538-Elo": {
            "date": ["2022-10-21", "2022-10-24", "2022-10-28", "2022-10-31", "2022-11-04", "2022-11-07", "2022-11-14", "2022-11-21"],
            "y":    [0.18, 0.2118, 0.16, 0.1332, 0.13, 0.1436, 0.2198, 0.3082]
        },
        "538-RAPTOR": {
            "date": ["2022-10-19", "2022-10-24", "2022-10-28", "2022-10-31", "2022-11-04", "2022-11-07", "2022-11-14", "2022-11-21"],
            "y":    [0.22, 0.2376, 0.24, 0.2029, 0.18, 0.1621, 0.2644, 0.2931]
        }
    },
    "den": {
        "538-Elo": {
            "date": ["2022-10-21", "2022-10-24", "2022-10-28", "2022-10-31", "2022-11-04", "2022-11-07", "2022-11-14", "2022-11-21"],
            "y":    [0.01, 0.0205, 0.01, 0.0151, 0.02, 0.0226, 0.0456, 0.0327]
        },
        "538-RAPTOR": {
            "date": ["2022-10-19", "2022-10-24", "2022-10-28", "2022-10-31", "2022-11-04", "2022-11-07", "2022-11-14", "2022-11-21"],
            "y":    [0.12, 0.1205, 0.07, 0.0583, 0.07, 0.0691, 0.0661, 0.063]
        }
    }
}

nickreich commented 1 year ago

This is a great start on this.

I am realizing that I made a conceptual error in my earlier descriptions. I was confusing "bin" prediction types and "binary" target types.

In general, not all "bin" prediction types will have a TRUE row as this one target in the NBA project has. So this example might not be a good one.

My suggestion above needs to be changed a bit as follows:

for point prediction types: the values can be directly queried and transferred to y values as in the JSON above.
for bin prediction types:
- the general solution I think will involve retrieving and storing the bin labels (in the NBA example above these are TRUE and FALSE) and their associated values for each date
- then the value for each unique bin label could be plotted separately, as it's own "line" or "set of points" over time
- maybe ideally we would have a way to select one bin and plot that over time.

Suggestion for moving forward:

implement with the NBA project using the "season wins" target, which should have point predictions.
don't implement for bin predictions just yet.
for possible better flexibility down the road, consider re-labeling the "y" array in the JSON above as "point", as I could see us later wanting objects that might be associated with specific prediction types.

matthewcornell commented 1 year ago

Thanks, Nick - good plan. Here's the updated spec in case @elray1 wants to chime in.

Multiple forecasts over time: API

Following is example API input, intermediate Zoltar query, and API output for the Zoltar NBA project.

workflow

UI: user specifies inputs and submits to server
Zoltar: translates request input to forecast query, executes (waiting), and then converts resulting csv rows (applying rules [1]) to JSON to return
UI: plots returned data

[1] Rules to convert rows to plottable data:

for point prediction types: the values can be directly queried and transferred to y values as in the JSON above.
for bin prediction types:
- the general solution I think will involve retrieving and storing the bin labels (in the NBA example above these are TRUE and FALSE) and their associated values for each date
- then the value for each unique bin label could be plotted separately, as its own "line" or "set of points" over time
- maybe ideally we would have a way to select one bin and plot that over time.

input: a JSON object from the UI specifying what data is to be plotted

The only difference between this and a Zoltar forecast query is the dates start/end range, rather than a list of timezeros.

{
 "project_id": 316,
 "units":     ["bos", "den"],
 "models":    ["538-Elo", "538-RAPTOR"],
 "dates":     ["2022-10-18", "2023-04-09"],
 "target":    "season wins"
}

corresponding intermediate Zoltar query

Here I'm simulating the expansion of dates to a small set of timezeros. We might want to restrict to bin and point prediction types.

{"units": ["bos", "den"],
 "targets": ["season wins"],
 "timezeros": ["2022-10-19", "2022-10-21", "2022-10-24", "2022-10-28", "2022-10-31", "2022-11-01", "2022-11-02", "2022-11-03", "2022-11-04", "2022-11-05", "2022-11-06", "2022-11-07", "2022-11-13", "2022-11-14", "2022-11-16", "2022-11-17", "2022-11-18", "2022-11-19", "2022-11-20", "2022-11-21"],
 "models": ["538-Elo", "538-RAPTOR"]
}

Behind the scenes this results in num_rows=32 .

output: a JSON object containing the data to plot

This format is similar to that of predtimechart's forecast and truth data formats. The object has one key per unit, where each unit becomes its own subplot/facet. Each key's value is an object that has one key per model. Each model has x/y data (date and y) that's used to plot one trace on the subplot.

{
  "bos": {
    "538-Elo": {
      "date": ["2022-10-21", "2022-10-24", "2022-10-28", "2022-10-31", "2022-11-04", "2022-11-07", "2022-11-14", "2022-11-21"],
      "point": [57, 59, 56, 54, 53, 55, 58, 61]
    },
    "538-RAPTOR": {
      "date": ["2022-10-19", "2022-10-24", "2022-10-28", "2022-10-31", "2022-11-04", "2022-11-07", "2022-11-14", "2022-11-21"],
      "point": [57, 58, 57, 55, 54, 55, 58, 60]
    }
  },
  "den": {
    "538-Elo": {
      "date": ["2022-10-21", "2022-10-24", "2022-10-28", "2022-10-31", "2022-11-04", "2022-11-07", "2022-11-14", "2022-11-21"],
      "point": [42, 45, 42, 42, 44, 45, 49, 47]
    },
    "538-RAPTOR": {
      "date": ["2022-10-19", "2022-10-24", "2022-10-28", "2022-10-31", "2022-11-04", "2022-11-07", "2022-11-14", "2022-11-21"],
      "point": [53, 53, 51, 50, 51, 51, 52, 51]
    }
  }
}

matthewcornell commented 1 year ago

FYI slack thread: https://reichlab.slack.com/archives/C04CYGMDFS4/p1689686394483149

matthewcornell commented 1 year ago

FYI slack thread from @elray1 re: differences b/w "classic" (current predtimechart) and "new" (this) versions: https://reichlab.slack.com/archives/C04CYGMDFS4/p1691593308450669?thread_ts=1691592032.148409&cid=C04CYGMDFS4

elray1 commented 1 year ago

Here are some notes based on conversation with Matt just now:

I think the key difference between the two plots has to do with how they deal with reference dates, horizons, and target end dates:

The "classic" plot has the forecasts' target dates along the horizontal axis. For a given reference date, the plot displays and connects step-ahead forecasts for multiple horizons/target dates.
- In the current UI, arrow keys or the calendar widget are used to navigate across reference dates, selecting one reference date at a time.
Roughly, the "new" proposed plot has reference dates/timezeros along the horizontal axis, and displays forecasts for a "single target". However, it is not precise what we mean here by a "single target".
- In Zoltar, there is a "target key (group)", which roughly matches up with the concept of an "outcome variable" like "incident cases".
  - If the target group is for step-ahead forecasts, the target group may contain multiple specific targets, e.g., "1 wk ahead incident cases", "2 wk ahead incident cases", ...
  - If the target group is not for step-ahead forecasts, the target group will contain exactly one specific target.
- In hubverse stuff, we have said that it may be helpful to split the outcome variable and the horizon into two columns because they contain distinct information.
- Additionally, a reminder that there is redundancy between the reference_date, horizon, and target_date, in that target_date = reference_date + horizon. A hub might choose to organize things in terms of the combination of variables like (outcome_variable, location, reference_date, horizon) or in terms of the combination of (outcome_variable, location, reference_date, target_date). The latest flusight challenge is even using both: (outcome_variable, location, reference_date, horizon, target_date). Any of these choices is valid in the sense that it serves to uniquely identify/specify the forecasting problem.
- Note that Nick's example above is for a target that is not a step-ahead target, so it eliminates all of this complexity about horizons and the possibility multiple Zoltar targets within a Zoltar target group.
- To enable functionality that works for hubs in general, I suggest that we should try to avoid tying predtimechart into the Zoltar terminology where for step ahead targets, a target necessarily corresponds to a combination of outcome variable and horizon. For instance, if a hub collects forecasts organized by (outcome_variable, location, reference_date, target_date), (a) we might not want to force them to map this to a horizon variable in order to use predtimechart, and (b) in principle, they might want to plot forecasts for a single value of the target_date with multiple reference dates on the horizontal axis, showing how forecasts got closer to the prediction target as the reference date approached the target_date. This kind of plot is possible if they can identify a "target" with a combination of outcome variable and target_date, but it is not directly possible if they have to identify a "target" with a combination of outcome variable and horizon.
- I therefore think that predtimechart should not be aware of the Zoltar distinction between a target and a targetKey, and I would prefer not to bake that kind of terminology into the fetchData API. Ideally, I think that in situations where we are using predtimechart as the visualization library for a Zoltar project, we should be able to provide a fetchData function that accepts whatever task id variables were used in a project and looks up the right target to use.
- It seems like for the new plot, we would a validation that is something like "only a single forecast was returned from each fetchData call for each combination of model and reference date".

There are some things that are not differences between the two plot types:

As noted by Nick, facetting could apply both to the proposed plot here and the "classic" plot.
Showing multiple forecasts over time could apply both to the proposed plot here and the "classic" plot (discussed in one of the slack threads linked to above).
We might want to show truth data for both plots
- but the form of display for truth data would be different. In the classic plot, we have a different truth data value for each target date along the horizontal axis, so truth data is represented with a "time series plot". In the new plot, we may have a single truth value that is shared for all dates, so a horizontal line would be used.
We might want to show confidence intervals for both plots

Given that, I think most of the UI would not change between the two plot types. However, there is a piece here that I still haven't fully thought through related to how UI navigations across "as of dates" or reference times would work. In the classic plot, navigating across "as of dates" updates truth as well as the reference dates for which forecasts are displayed. But now we are saying that forecasts at all reference dates will be displayed no matter what the reference date is...

reichlab / predtimechart

add feature to show multiple forecasts over time #15

Multiple forecasts over time: API

workflow

input: a JSON object from the UI specifying what data is to be plotted

corresponding intermediate Zoltar query

output: a JSON object containing the data to plot

Multiple forecasts over time: API

workflow

input: a JSON object from the UI specifying what data is to be plotted

corresponding intermediate Zoltar query

output: a JSON object containing the data to plot