Elex 3393 override model racecall

lennybronner commented 1 month ago

Description

This adds a feature that lets us control whether to allow for a model race call or not. If a contest is included in stop_model_call then then we force the prediction interval of that contest to cross zero. Correspondingly we also adjust the national summary to make sure that at least 10% of the samples from that contest cross zero also.

I also changed called_contests into two lists rather than one dictionary. One that specifies lhs calls and the other that specifies rhs calls these are lhs_called_contests and rhs_called_contests and are now just a list of called states.

Jira Ticket

https://arcpublishing.atlassian.net/browse/ELEX-3393

Test Steps

Running this forces the lower interval to be below zero:

elexmodel 2017-11-07_VA_G --estimands=margin --office_id=G --geographic_unit_type=county --pi_method bootstrap --features baseline_normalized_margin --called_contests='{"VA": -1}' --percent_reporting 95 --aggregates postal_code --stop_model_call VA

There is also the corresponding PR in the testbed PR that this can be tested with: https://github.com/WashPost/elex-live-model-testbed/pull/27

dmnapolitano commented 1 month ago

Hi! I have questions 😄

Is the default for every state False? Why not just make --allow_race_call a list of postal_code, and it's True for the states that appear in the list, False for those that don't? 🤔
What does it mean for the model to not be allowed to call races for a given state? Does this mean that if the model predicts that a race can be called, is the idea here to like, override that, or bury it? And if so, why? The model doesn't actually call races to begin with, so is this just to make it less confusing to people looking at model output? 🤔

That's it for now but I'm sure I'll have more questions soon 😅

lennybronner commented 1 month ago

Ha, of course.

Because we don't keep a list of the order of the contests when doing the bootstrap. What that means is that when we call the race/allow race calls we expect every contest to exist in the object that is used to define race calls (or define which contests are allowed to be called by the model). This is why we check to make sure that every contest is accounted for in the dictionary. I'll also add that for the presidential election the default for most states will be True and this will only be set to False for the ~15 states that are likely to be reasonably close.
You are right that the model doesn't call races, but I was unsure how else to call this. What I call a model race call here is when the interval of the margin model no longer overlaps with zero (effectively saying there is a less than alpha% chance that the person behind will take the lead). In cases where allow race call is set to False the interval is forced to overlap with zero.

dmnapolitano commented 1 month ago

Ha, of course.

Because we don't keep a list of the order of the contests when doing the bootstrap. What that means is that when we call the race/allow race calls we expect every contest to exist in the object that is used to define race calls (or define which contests are allowed to be called by the model). This is why we check to make sure that every contest is accounted for in the dictionary. I'll also add that for the presidential election the default for most states will be True and this will only be set to False for the ~15 states that are likely to be reasonably close.

You are right that the model doesn't call races, but I was unsure how else to call this. What I call a model race call here is when the interval of the margin model no longer overlaps with zero (effectively saying there is a less than alpha% chance that the person behind will take the lead). In cases where allow race call is set to False the interval is forced to overlap with zero.

😁

Hrm, I'm not sure I follow, although I am sick so that's not helping lol 🤧 Why does the order of the contests matter?
Ah! Yeah this is tricky. The AP uses the language "call status", e.g. "the model put the presidential race in Iowa in call status" and then it's up to a human whether or not they agree with the call status and actually make the call. We could probably also say like "race call suggestion" or something 🤔
But I'm also not sure I understand, why do we need this? Is this because there are certain contests we don't want to report to the Live team as "the model suggests this should be called"? Is that even something the front-end would display? 🤔

lennybronner commented 1 month ago

Ha, of course.

Because we don't keep a list of the order of the contests when doing the bootstrap. What that means is that when we call the race/allow race calls we expect every contest to exist in the object that is used to define race calls (or define which contests are allowed to be called by the model). This is why we check to make sure that every contest is accounted for in the dictionary. I'll also add that for the presidential election the default for most states will be True and this will only be set to False for the ~15 states that are likely to be reasonably close.

You are right that the model doesn't call races, but I was unsure how else to call this. What I call a model race call here is when the interval of the margin model no longer overlaps with zero (effectively saying there is a less than alpha% chance that the person behind will take the lead). In cases where allow race call is set to False the interval is forced to overlap with zero.

😁

Hrm, I'm not sure I follow, although I am sick so that's not helping lol 🤧 Why does the order of the contests matter?

Ah! Yeah this is tricky. The AP uses the language "call status", e.g. "the model put the presidential race in Iowa in call status" and then it's up to a human whether or not they agree with the call status and actually make the call. We could probably also say like "race call suggestion" or something 🤔

But I'm also not sure I understand, why do we need this? Is this because there are certain contests we don't want to report to the Live team as "the model suggests this should be called"? Is that even something the front-end would display? 🤔

I suggest you go through the code, but the order of the contest matters because each contest is a row in the prediction/interval dataframe
We don't make race calls though, so it's not a suggestion. I think model call is fine tbh, it is distinct enough from race call.
There are two scenarios we want this: a) a race is very clearly too close to call but for some reason our model doesn't see it this way (I imagine this could happen very to all ballots being counted, where the interval generated by our model is basically non-existent but the difference between the candidates is very small. A good example of this is a recount that the AP does not call, but our model is pretty clear who will win). b) We want to make sure that we are 100% comfortable with a race call, say because it predictions the presidency. This gives us capacity to recheck everything on our end.

dmnapolitano commented 1 month ago

Ha, of course.

Because we don't keep a list of the order of the contests when doing the bootstrap. What that means is that when we call the race/allow race calls we expect every contest to exist in the object that is used to define race calls (or define which contests are allowed to be called by the model). This is why we check to make sure that every contest is accounted for in the dictionary. I'll also add that for the presidential election the default for most states will be True and this will only be set to False for the ~15 states that are likely to be reasonably close.

You are right that the model doesn't call races, but I was unsure how else to call this. What I call a model race call here is when the interval of the margin model no longer overlaps with zero (effectively saying there is a less than alpha% chance that the person behind will take the lead). In cases where allow race call is set to False the interval is forced to overlap with zero.

😁

Hrm, I'm not sure I follow, although I am sick so that's not helping lol 🤧 Why does the order of the contests matter?

Ah! Yeah this is tricky. The AP uses the language "call status", e.g. "the model put the presidential race in Iowa in call status" and then it's up to a human whether or not they agree with the call status and actually make the call. We could probably also say like "race call suggestion" or something 🤔

But I'm also not sure I understand, why do we need this? Is this because there are certain contests we don't want to report to the Live team as "the model suggests this should be called"? Is that even something the front-end would display? 🤔

I suggest you go through the code, but the order of the contest matters because each contest is a row in the prediction/interval dataframe

We don't make race calls though, so it's not a suggestion. I think model call is fine tbh, it is distinct enough from race call.

There are two scenarios we want this: a) a race is very clearly too close to call but for some reason our model doesn't see it this way (I imagine this could happen very to all ballots being counted, where the interval generated by our model is basically non-existent but the difference between the candidates is very small. A good example of this is a recount that the AP does not call, but our model is pretty clear who will win). b) We want to make sure that we are 100% comfortable with a race call, say because it predictions the presidency. This gives us capacity to recheck everything on our end.

Ha, of course.

Because we don't keep a list of the order of the contests when doing the bootstrap. What that means is that when we call the race/allow race calls we expect every contest to exist in the object that is used to define race calls (or define which contests are allowed to be called by the model). This is why we check to make sure that every contest is accounted for in the dictionary. I'll also add that for the presidential election the default for most states will be True and this will only be set to False for the ~15 states that are likely to be reasonably close.

You are right that the model doesn't call races, but I was unsure how else to call this. What I call a model race call here is when the interval of the margin model no longer overlaps with zero (effectively saying there is a less than alpha% chance that the person behind will take the lead). In cases where allow race call is set to False the interval is forced to overlap with zero.

😁

Hrm, I'm not sure I follow, although I am sick so that's not helping lol 🤧 Why does the order of the contests matter?

Ah! Yeah this is tricky. The AP uses the language "call status", e.g. "the model put the presidential race in Iowa in call status" and then it's up to a human whether or not they agree with the call status and actually make the call. We could probably also say like "race call suggestion" or something 🤔

But I'm also not sure I understand, why do we need this? Is this because there are certain contests we don't want to report to the Live team as "the model suggests this should be called"? Is that even something the front-end would display? 🤔

I suggest you go through the code, but the order of the contest matters because each contest is a row in the prediction/interval dataframe

We don't make race calls though, so it's not a suggestion. I think model call is fine tbh, it is distinct enough from race call.

There are two scenarios we want this: a) a race is very clearly too close to call but for some reason our model doesn't see it this way (I imagine this could happen very to all ballots being counted, where the interval generated by our model is basically non-existent but the difference between the candidates is very small. A good example of this is a recount that the AP does not call, but our model is pretty clear who will win). b) We want to make sure that we are 100% comfortable with a race call, say because it predictions the presidency. This gives us capacity to recheck everything on our end.

Sure, but do you mean the order of the contests (rows) in the dataframe matters? Like if we call contests in WY before AL is that a problem? Or do you mean ordered by postal_code and time? I see this: https://github.com/washingtonpost/elex-live-model/pull/106/files#diff-860e7023ac0648e754646a4515fe14f36af787e7c64954ded730b53559fdb0dcR1359 but why does that matter beyond trying to zip these two dictionaries? And aren't all the postal_codes for the election being run in reporting_units + nonreporting_units? Why can't that be used? 🤔 🤔
That works! 🎉 I think it's worth updating the name in the code but I don't have a good sense of how involved that is so it might just have to be a Jira ticket and I'll just have to s/race call/model call/g every time I look at the code 😄
I see, so this is just for our debugging purposes? I still don't know who sees (or might see) model race calls besides us 🤔

lennybronner commented 1 month ago

We don't have access to the reporting and non-reporting units in get_national_summary_estimates which is why called_races was initially a dictionary with all the contests. It's still like this in in order to make the move to get_aggregate_prediction_intervals as easy as possible. I think it makes sense that these two dictionaries, which are quite similar in what they do, look the same.
Any reader looking at our pages where a race is being shown

dmnapolitano commented 1 month ago

We don't have access to the reporting and non-reporting units in get_national_summary_estimates which is why called_races was initially a dictionary with all the contests. It's still like this in in order to make the move to get_aggregate_prediction_intervals as easy as possible. I think it makes sense that these two dictionaries, which are quite similar in what they do, look the same.

Any reader looking at our pages where a race is being shown

So for both of these dictionaries, I have to enter all 51 states? So 102 states? Plus the values since those are the dictionary keys, so now we're at 204 possible things to type in on the CLI, in a properly-formatted dictionary 😭 Furthermore, the spot where I found the sorting is in get_aggregate_prediction_intervals(), which has reporting_units and nonreporting_units. Internally, the data can be made to look however we need them to, because if you have five states for which you want to allow race calls and the remainder you don't, and it's a purely binary choice, it really isn't necessary to make someone type in all 51 states plus True or False for each of them when they just need to enter in five * two-letter abbreviations. We can create the dictionary we need internally. called_contests is a little trickier since there are three options, but even then we have a default value of fill_value 🤔 I'd be happy to refactor this if it would help. I'm sorry if I'm still misunderstanding something, but if this really requires the amount of CLI input I'm thinking it does, 😬
Did you type something here? If you did, github didn't render it 🤔
Really?!?! Wait so then, if the model doesn't call races but we're showing model-produced race calls...do we have really careful language around this? 😬

lennybronner commented 1 month ago

ok, I changed race call objects into lists. Let me know what you think

dmnapolitano commented 1 month ago

Let's gooo!! 🎉 🚀

washingtonpost / elex-live-model