Closed nickreich closed 7 months ago
Noting that the idea here would be to
mean
, median
and perhaps mode
prediction types to zoltar, leaving point
prediction type as legit for legacy projects.point
target types.Noting that this is related to https://github.com/Infectious-Disease-Modeling-Hubs/hubData/issues/10
Capturing this as a table: This will be the new "Valid prediction types by target type" table:
Around this line in the function _load_truth_data()
you will see that we use the 'point' prediction type for the prediction dictionary data that generated from incoming truth rows. (This dictionary is then loaded into the database via load_predictions_from_json_io_dict()
.) Since we are retiring point types, how do we handle/represent truth with our three point replacements (mean, median and perhaps mode)?
@matthewcornell The table looks correct to me.
Your question about representation of truth data is a good one. Given that truth data are a separate kind of data stored in Zoltar, I'm not sure it really matters what type we store them as. Honestly, we could keep them stored as "point", and have them be the only thing that could use that prediction type? Or we could just pick one of "mean", "median" or "mode" and use that from here on out? (would that leave backwards compatibility issues for truth data stored in a certain way in the system now?) I'm imagining that we might need to pick one prediction type and store all truth data in the system as that type? Maybe for that reason we should choose "mode" as it is valid for all target types, as is "point"?
@nickreich WRT the underlying truth representation ("point" vs. ...) I don't think there will be an external impact (queries come back as CSV with a catch-all 'value' column for "point" and - with this change - the other three), so I'm ok staying with "point" or switching to "mode". I suggest we use the latter so that we are moving away from "point". Yes, that will mean old truth data is "point" and new is "mode", but we can change the former to latter if/when we do your item #3 above (consider migrating some point data to the other new prediction types).
We will need to review constraints specified in https://docs.zoltardata.com/validation/ - which apply to the three new types? E.g., are they the same as point?
We will also need to review conversions from/to the three new types here: https://docs.zoltardata.com/forecastqueryformat/#automatic-prediction-type-conversion . Again, same as point?
The Point prediction elements constraints could be the same for mean, median and mode.
In the Tests for Prediction Elements by Target Type section, I think all of the uses of "point" prediction type could just be replaced by the "mean, median, and mode" types.
We want the same functionality, validation and constraints for the three new types as we had for point.
The conversions also seem more or less straight forward. My suggestion might be to aspirationally support the following conversions:
mean <- named, sample
median <- named, sample
mode <- named
I intentionally left out the conversion of sample -> mode as there could be complications here for continuous outcomes. So I vote for just leaving it off for now.
And I see that we have some existing conversions supported
conversions: point <- sample (uses either statistics.mean() or statistics.median() depending on the convert.point option)
Which I think could just be translated as the functions to convert from sample to mean and median types.
I'm making progress, but I think it will be clean to do this via three implementation phases:
_load_truth_data()
from "point" to "mode". replace it in tests with mode. do not convert existing forecasts (issue #377)Thoughts?
This sounds like a good approach to me overall. I think phase one and two are nicely distinct and clearly we want to do them, in that order.
I am not yet convinced that we want to do phase three, as it would require "relabeling" prior data in places where we cannot be certain what the original intent or data type should be. So maybe we could get away with just keeping it as a vestigial artifact and never actually retiring/removing "points" at all?
@nickreich , in this section of the docs, I think # 1 should say "convert FROM" instead of "convert TO", agreed?:
Auto-conversion is activated via two query parameters:
- Specify the forecast types (# 5 above) that you want the program to convert TO (the default, if unspecified is all types).
- Include query options for the conversions you want. Including an option does two things: a) it tells Zoltar that it should convert TO that type (i.e., that conversion is activated), and b) possible details about how to do the conversion.
@matthewcornell I'm confused about what the "automatic conversion" means. Are some predictions converted automatically? do we specify whether or not they are converted somewhere? Could you point to an example in Zoltar where this is happening (or configured) currently?
{... "options": {"convert.mean": true}}
("convert.mean" is a new option).Regarding your question about FROM vs. TO, I just don't have enough of a sense from the documentation what is actually happening with the conversion to answer your question. Maybe we should just try to re-write that part of the documentation to more clearly reflect the action that is being taken?
@nickreich Zoltpy provides the (somewhat experimental) feature of bulk import/export specified in this issue: [investigate support of bulk zoltar queries #321] - docs here. IIRC Martha tried using it and it wasn't viable (maybe due to trouble accessing Python via R?), and so my question is whether it's worth my updating that part of the code to use the current issue's new prediction types. I suggest we remove the code from both zoltpy and the server unless we think we'll use it again. Thoughts?
I could maybe see us using this again at some point in the future, but I agree that it feels vestigial at this point and could be removed for now.
To try to bring Zoltar closer to our current thinking about how to represent point forecasts, we should add
mean
,median
, and perhapsmode
to the list of valid prediction types. However, this raises a question of what target_types are valid for each of these new prediction_types.This is the current table from Zoltar:
tagging @elray1 and @annakrystalli for additional input.