reichlab / predtimechart

MIT License
0 stars 0 forks source link

add feature to show multiple forecasts over time #15

Open nickreich opened 1 year ago

nickreich commented 1 year ago

E.g. instead of the mode that predtimechart uses now to show forecasts from a single timezero or forecast_date, we could code up a different chart where you select a target (or "outcome"?) and a time-frame and i guess a location and then all predictions that have been made over time for those selections are shown.

nickreich commented 1 year ago

E.g. here is a figure showing predictions (points) for different models (colors) from different timezeroes, this is from the NBA predictions project in zoltar, plotted in R. In some ways, this is a simpler kind of chart than the existing main plotting feature in predtimechart because there isn't any of the inferring of dates that needs to happen to get x-axes aligned that goes on in the current plot feature. I'm imagining that just one of these plots would be shown in resolving this issue, but the faceting (multiple embedded plots) is an example of what is requested in #16 .

image

matthewcornell commented 1 year ago

I think this would require a new API and a new chart type (e.g., a tab or separate URL).

Question: What would the API look like for downloading necessary data? (Specifically the function call arguments and the returned data format.)

matthewcornell commented 1 year ago

@nickreich Q: The example image is for a particular project target, correct? The targets from the NBA Zoltar project are: "make playoffs", "season wins", and "win finals", with only the middle one being non-binary.

nickreich commented 1 year ago

@matthewcornell Yes, the idea is that you would pick one particular target. Predictions for that target are shown on y-axis, with timezero where the forecast was made on the x-axis.

matthewcornell commented 1 year ago

@nickreich : Q: I'm trying to understand how you've gone from zoltar data to the example output you gave. I tried this query:

{'units': ['bos', 'den'],
 'models': ['538-Elo', '538-RAPTOR'],
 'targets': ['make playoffs']}

which returned 1309 rows of bin data, e.g.,

model   timezero    unit    target      class   cat prob
538-Elo 2022-10-13  bos make playoffs   bin TRUE    0.959
538-Elo 2022-10-13  bos make playoffs   bin FALSE   0.041
538-Elo 2022-10-14  bos make playoffs   bin TRUE    0.96
538-Elo 2022-10-14  bos make playoffs   bin FALSE   0.04
...

Q: Do you have time to explain this to me?

Note: I'm using a simplified query while trying to understand this. Ultimately we'll need to specify a list of timezeros from the relevant period (the entire NBA season, i.e., 2022–23 NBA season: October 18, 2022, - April 9, 2023?) as well as types (?)

nickreich commented 1 year ago

Basically, this looks like output that is nearly ready to be plotted. The one thing is that for a prediction element with bin class, we only need to plot one row (generally the cat=TRUE row), as the prob value for the cat=FALSE row will be 1-prob in the TRUE row. For the rows above, I'd imagine a plot that used timezero as the x-axis and prob as the y-axis, after subsetting to only include the cat=TRUE rows.

matthewcornell commented 1 year ago

Thanks, Nick. So we will need rules for translating between data classes (bin, named, point, sample, and quantile) to y-axis data for the charts.

nickreich commented 1 year ago

Yes. For point it's straight forward (just the value), and for bin I've explained it above. could we start by implementing just those two?

matthewcornell commented 1 year ago

Yep!

nickreich commented 1 year ago

do we have a sample--> point conversion function? this is something we've talked about before, right?

matthewcornell commented 1 year ago

According to our Automatic prediction type conversion docs:

Currently, only these combinations are implemented:

matthewcornell commented 1 year ago

Multiple forecasts over time: API

Following is example API input, intermediate Zoltar query, and API output for the Zoltar NBA project.

workflow

  1. UI: user specifies inputs and submits to server
  2. Zoltar: translates request input to forecast query, executes (waiting), and then converts resulting csv rows (applying rules [1]) to JSON to return
  3. UI: plots returned data

[1] Rules to convert rows to plottable data:

input: a JSON object from the UI specifying what data is to be plotted

The only difference between this and a Zoltar forecast query is the dates start/end range, rather than a list of timezeros.

{
 "project_id": 316,
 "units":     ["bos", "den"],
 "models":    ["538-Elo", "538-RAPTOR"],
 "dates":     ["2022-10-18", "2023-04-09"],
 "target":    "win finals"
}

corresponding intermediate Zoltar query

Here I'm simulating the expansion of dates to a small set of timezeros. We might want to restrict to bin and point prediction types.

{"units": ["bos", "den"],
 "targets": ["win finals"],
 "timezeros": ["2022-10-19", "2022-10-21", "2022-10-24", "2022-10-28", "2022-10-31", "2022-11-01", "2022-11-02", "2022-11-03", "2022-11-04", "2022-11-05", "2022-11-06", "2022-11-07", "2022-11-13", "2022-11-14", "2022-11-16", "2022-11-17", "2022-11-18", "2022-11-19", "2022-11-20", "2022-11-21"],
 "models": ["538-Elo", "538-RAPTOR"]
}

Behind the scenes this results in num_rows=32 .

output: a JSON object containing the data to plot

This format is similar to that of predtimechart's forecast and truth data formats. The object has one key per unit, where each unit becomes its own subplot/facet. Each key's value is an object that has one key per model. Each model has x/y data (date and y) that's used to plot one trace on the subplot.

{
    "bos": {
        "538-Elo": {
            "date": ["2022-10-21", "2022-10-24", "2022-10-28", "2022-10-31", "2022-11-04", "2022-11-07", "2022-11-14", "2022-11-21"],
            "y":    [0.18, 0.2118, 0.16, 0.1332, 0.13, 0.1436, 0.2198, 0.3082]
        },
        "538-RAPTOR": {
            "date": ["2022-10-19", "2022-10-24", "2022-10-28", "2022-10-31", "2022-11-04", "2022-11-07", "2022-11-14", "2022-11-21"],
            "y":    [0.22, 0.2376, 0.24, 0.2029, 0.18, 0.1621, 0.2644, 0.2931]
        }
    },
    "den": {
        "538-Elo": {
            "date": ["2022-10-21", "2022-10-24", "2022-10-28", "2022-10-31", "2022-11-04", "2022-11-07", "2022-11-14", "2022-11-21"],
            "y":    [0.01, 0.0205, 0.01, 0.0151, 0.02, 0.0226, 0.0456, 0.0327]
        },
        "538-RAPTOR": {
            "date": ["2022-10-19", "2022-10-24", "2022-10-28", "2022-10-31", "2022-11-04", "2022-11-07", "2022-11-14", "2022-11-21"],
            "y":    [0.12, 0.1205, 0.07, 0.0583, 0.07, 0.0691, 0.0661, 0.063]
        }
    }
}
nickreich commented 1 year ago

This is a great start on this.

I am realizing that I made a conceptual error in my earlier descriptions. I was confusing "bin" prediction types and "binary" target types.

In general, not all "bin" prediction types will have a TRUE row as this one target in the NBA project has. So this example might not be a good one.

My suggestion above needs to be changed a bit as follows:

Suggestion for moving forward:

matthewcornell commented 1 year ago

Thanks, Nick - good plan. Here's the updated spec in case @elray1 wants to chime in.


Multiple forecasts over time: API

Following is example API input, intermediate Zoltar query, and API output for the Zoltar NBA project.

workflow

  1. UI: user specifies inputs and submits to server
  2. Zoltar: translates request input to forecast query, executes (waiting), and then converts resulting csv rows (applying rules [1]) to JSON to return
  3. UI: plots returned data

[1] Rules to convert rows to plottable data:

input: a JSON object from the UI specifying what data is to be plotted

The only difference between this and a Zoltar forecast query is the dates start/end range, rather than a list of timezeros.

{
 "project_id": 316,
 "units":     ["bos", "den"],
 "models":    ["538-Elo", "538-RAPTOR"],
 "dates":     ["2022-10-18", "2023-04-09"],
 "target":    "season wins"
}

corresponding intermediate Zoltar query

Here I'm simulating the expansion of dates to a small set of timezeros. We might want to restrict to bin and point prediction types.

{"units": ["bos", "den"],
 "targets": ["season wins"],
 "timezeros": ["2022-10-19", "2022-10-21", "2022-10-24", "2022-10-28", "2022-10-31", "2022-11-01", "2022-11-02", "2022-11-03", "2022-11-04", "2022-11-05", "2022-11-06", "2022-11-07", "2022-11-13", "2022-11-14", "2022-11-16", "2022-11-17", "2022-11-18", "2022-11-19", "2022-11-20", "2022-11-21"],
 "models": ["538-Elo", "538-RAPTOR"]
}

Behind the scenes this results in num_rows=32 .

output: a JSON object containing the data to plot

This format is similar to that of predtimechart's forecast and truth data formats. The object has one key per unit, where each unit becomes its own subplot/facet. Each key's value is an object that has one key per model. Each model has x/y data (date and y) that's used to plot one trace on the subplot.

{
  "bos": {
    "538-Elo": {
      "date": ["2022-10-21", "2022-10-24", "2022-10-28", "2022-10-31", "2022-11-04", "2022-11-07", "2022-11-14", "2022-11-21"],
      "point": [57, 59, 56, 54, 53, 55, 58, 61]
    },
    "538-RAPTOR": {
      "date": ["2022-10-19", "2022-10-24", "2022-10-28", "2022-10-31", "2022-11-04", "2022-11-07", "2022-11-14", "2022-11-21"],
      "point": [57, 58, 57, 55, 54, 55, 58, 60]
    }
  },
  "den": {
    "538-Elo": {
      "date": ["2022-10-21", "2022-10-24", "2022-10-28", "2022-10-31", "2022-11-04", "2022-11-07", "2022-11-14", "2022-11-21"],
      "point": [42, 45, 42, 42, 44, 45, 49, 47]
    },
    "538-RAPTOR": {
      "date": ["2022-10-19", "2022-10-24", "2022-10-28", "2022-10-31", "2022-11-04", "2022-11-07", "2022-11-14", "2022-11-21"],
      "point": [53, 53, 51, 50, 51, 51, 52, 51]
    }
  }
}
matthewcornell commented 1 year ago

FYI slack thread: https://reichlab.slack.com/archives/C04CYGMDFS4/p1689686394483149

matthewcornell commented 1 year ago

FYI slack thread from @elray1 re: differences b/w "classic" (current predtimechart) and "new" (this) versions: https://reichlab.slack.com/archives/C04CYGMDFS4/p1691593308450669?thread_ts=1691592032.148409&cid=C04CYGMDFS4

elray1 commented 1 year ago

Here are some notes based on conversation with Matt just now:

I think the key difference between the two plots has to do with how they deal with reference dates, horizons, and target end dates:

There are some things that are not differences between the two plot types:

Given that, I think most of the UI would not change between the two plot types. However, there is a piece here that I still haven't fully thought through related to how UI navigations across "as of dates" or reference times would work. In the classic plot, navigating across "as of dates" updates truth as well as the reference dates for which forecasts are displayed. But now we are saying that forecasts at all reference dates will be displayed no matter what the reference date is...