Implementing write-in counting

JohnLCaron commented 2 years ago

"Write-ins are assumed to be explicitly registered or allowed to be lumped into a single “write-ins” category for the purpose of verifiable tallying. Verifiable tallying of free-form write-ins may be best done with a MixNet23 design." (p 15, spec 1.51)

Currently we have no explicit processing of write-ins.

Note that the spec does not currently describe the input PlaintextBallot. So theres no explicit specification about what a ballot marking device or scanner should do with write-ins.

A PlaintextBallot Selection has a selectionId, which is supposed to match a Manifest selectionId. But it wont unless all write-in Candidates are registered and added as a selection on the Manifest Contest. Call this the "explicitly registered" Option 1. In this case, theres nothing extra to do, if we assume that the scanner correctly identifies the write-in and creates the correct PlaintextBallot selection. The only difference between a regular candidate and a write-in candidate is that the write-in candidate doesnt appear on the ballot and must be (correctly) written in. Theres no need for adding to the ContestData (except for overvotes, TBD).

Another possibility is that the scanner adds any write-ins to the PlaintextBallot. Currently, we have an "extendedData" string (note: should renamed to "writeIn") on the Selection (should be placed on the Contest, not the Selection).

Option 2 is when write-ins are not pre-registered. The scanner puts any write-ins into the PlaintextBallot.Contest, and we add them to the EncryptedBallot Contest's ContestData field. In order to see if its a write-in, we have to decrypt ContestData.

Option 3 adds a "lumped write-in" selection to every contest. It does not have a matching Manifest selection. It records if there is a valid write-in that was voted for. These are encoded in the normal way, count against the contest limit, etc. Then one can find out if there are enough write-ins to affect the election. If so, the the ContestData is decoded and the actual writeins are tallied. In this case, we are back to adding selections to the contest, increasing the computational burden. If Contest.votes_allowed > 1, we need multiple ones, unless we can use range_proofs and allow Selection.vote > 1.

So it seems to me that there are three cases:

"Explicitly Registered", write-ins have selections in the Manifest and need no special processing by us.
"Free-form Write-Ins": write-ins are encoded in the ContestData which must be decoded to be counted.
"Lumped Write-Ins": in addition to the ContestData, every contest has a "lumped write-in" selection which records if there are valid write-in vote(s), and are tallied as usual, without needing to decode the ContestData at the same time.

JohnLCaron commented 2 years ago

thoughts, @danwallach ??

JohnLCaron commented 2 years ago

Ive implemented a first pass in PR#170 Will leave this open for further discussion and refinement.

danwallach commented 2 years ago

Case 1: Explicitly registered. This sounds like it's just a special form of a candidate. Maybe we just add a "write-in" boolean to the definition of a candidate and that's completely it. It's now the voting machine's problem to figure out how to deal with mapping from a voter's free-text input to a "registered" candidate. This seems... difficult in practice, but that's "not our problem."

Case 2 or 3. First, the contest should have a flag set on whether write-ins are allowed at all. If the flag is false, then there's no write-in at all, and no need for special support to handle it. Otherwise, we'd have one or more "candidates" with names like "Write-in (1)" and "Write-in (2)" that have a flag set and are presented to the voter as blank lines that expand to text entry boxes in some machine-specific fashion.

My thinking on this is that we're back to the same ContestData discussion we had a while back. If we decided that the ContestData field was meant to be the voter's original intent (before any overvote processing or anything else at all), then we have a general-purpose encrypted data structure that we might subsequently shove through a mixnet prior to tallying. We could fake this by having the trustees decrypt each and every ContestData field, skip the mixing, and just publish an array of the resulting plaintexts. This would only ever happen if there were enough write-in fields used, in total, to have a possibility of winning the contest.

danwallach commented 2 years ago

Turns out, the VVSG standards have a lot to say about this (borrowed from an issue on a VotingWorks thread):

1.1.4-H: The voting system must be capable of enabling and recording the voter's write-in of desired candidate names. A write-in is a contest option on the ballot that permits the voter to identify a candidate of choice that is not already listed as a contest option and is captured when the ballot is cast. State rules determine when a write-in candidate option may be placed as a contest option on the ballot and what qualifies as a valid write-in selection that may be counted.
1.1.5-E: The voting system must record additional contest information in the CVR that includes: 1/ identification of all contests in which a voter has made a contest selection; 2/ identification of all overvoted and undervoted contests; 3/ the number of write-ins recorded for the contest; and 4/ identification of the party for partisan ballots or partisan contests.
1.1.4-I: The voting system must be capable of gathering and recording write-in votes within a voting process that allows for reconciliation of aliases and double votes. Reconciliation of aliases means allowing election officials to declare two different spellings of a candidate's name to be equivalent (or not). Reconciliation of double votes means handling the case where, in an N-of-M contest, a voter has attempted to cast multiple votes for the same candidate using the write-in mechanism.
1.1.5-D: The voting system must record write-in information in the CVR that includes: identification of write-in selections made by the voter, the text of the write-in, when using a BMD or other device that marks the ballot for the voter, an image or other indication of the voter’s write-in markings, the total number of write-ins in the CVR
1.1.6-C: Batch-fed scanners, in response to unreadable ballots, write-ins, and other designated conditions, must do one of the following: out stack the ballot (that is, divert to a stack separate from the ballots that were normally processed), stop the ballot reader and display a message prompting the election official to remove the ballot, mark the ballot with an identifying mark to facilitate its later identification, if the ballot image uniquely identifies its corresponding ballot, use electronic adjudication to segregate the ballot. Item 4 allows the ballot image to be segregated if, for example, an identifier is printed on the ballot as it is scanned, so that the image of the ballot also contains this identifier. Without a unique identifier or other marking, the ballot image itself does not facilitate finding the corresponding paper ballot.
1.1.6-E: Voter-facing scanners, when scanning a ballot containing a write-in vote, must either: segregate the ballot in a manner that facilitate its later identification, if the ballot image uniquely identifies its corresponding ballot, use electronic adjudication to segregate the ballot. The requirement to separate ballots containing write-in votes is not applicable to systems in which a BMD encodes write-in votes in a machine-readable form. In this instance, and a scanner generates individual tallies for all written-in candidates automatically. Separation of ballots containing write-in votes is only necessary in systems that require the allocation of write-in votes to specific candidates to be performed manually.
1.1.8-C: The voting system must be capable of: 1/ tabulating votes for write-in candidates with separate totals for each contest choice, 2/ tabulating valid individual write-in candidate totals in each contest. Tabulation of candidate names that are manually written in on a hand voted paper ballot can only be tabulated as an aggregate total in each contest. Each name must be adjudicated from graphical images of the contest write-in area or from the ballot itself to determine the name of the candidate. When names are typed on an electronic voting unit such as a BMD, although the entered names must be recorded, only aggregate contest write-in totals are tabulated. Each individual write-in name must be adjudicated for validity before they can be aggregated. In most states, a write-in candidate must be registered to be valid. State rules also determine acceptable variations in the written name for the candidate to be credited with the vote. State rules also determine treatment of a written-in name of a candidate already listed on the ballot.
1.1.9-C: The voting system must have the capability to report the following categories of votes: 1/ in-person voting 2/ absentee voting 3/ write-ins 4/ accepted reviewed ballots 5/ rejected reviewed ballots

Doing these things in the context of homomorphic tallying is potentially complicated, especially the bit about a voter who tries to vote for a candidate under write-ins and normally.

The more I think about this, the more my brain hurts.

JohnLCaron commented 2 years ago

Ok, I havent absorbed all that; I presume much of it is for the election system (ES), and we just have to add whatever hooks are needed in our library to implement the various options.

My first pass implementation assumes that the ES gives us a list of write-in strings per contest in the input PlaintextBallot. Those go into the ContestData record which is encrypted and added to the EncryptedBallot. I have not yet added a lumped write-in selection, waiting for more spec. When decrypting ballots, I always decrypt the ContestData along with the selections using the guardian shares. This is adding ~10% to the cost of decrypting. That has let me test and verify encrypt/decrypt ContestData with minimal disruption to the workflow. I assume it will all change and complexify going forward.

One more detail is that write-ins are count against the limit when detecting overvotes. An overvote triggers setting all selections in the contest to 0. The overvotes are recorded in the ContestData, so the original ballot can be fully recovered.

If the ES can handle the "Reconciliation of double votes" and " Reconciliation of aliases" before sending us the PlaintextBallot, then that logic can stay out of our library.

danwallach commented 2 years ago

I'm thinking we should organize a "write-in summit" (i.e., a one hour Zoom) where the goal is to nail down all these particular details rather than just throwing an implementation together.

JohnLCaron commented 1 year ago

I need to do an implementation before I understand these things very deeply. The only danger of that is to not let prototype implementations become the spec. And it may be that this is complicated enough that you want to do an implementation before you finalize it.

So, Im ok with a summit, with the caveat that sometimes writing alternatives down to think about beforehand can help.

votingworks / electionguard-kotlin-multiplatform

Implementing write-in counting #167