usnistgov / ElectionResultsReporting

Common data format specification for election results reporting data
https://pages.nist.gov/ElectionResultsReporting
Other
23 stars 8 forks source link

Error? Counts should have a Status field #42

Open sfsinger19103 opened 4 years ago

sfsinger19103 commented 4 years ago

Currently the Count element has an optional Type field (datatype CountItemType), but no Status. This is problematic in an application trying to track the progress of the counts over time in the canvass, a subject of considerable interest to political scientists and the public.

Some solution options:

JDziurlaj commented 4 years ago

What are the possible values of this new Status attribute?

sfsinger19103 commented 4 years ago

Values of Status would be chosen from the CountItemStatus enumeration. So it's not really new.

It might be cleaner to link the Count element to the CountStatus element that already exists .

sfsinger19103 commented 4 years ago

If one of the solutions proposed above is adopted, it would be possible to remove the CountStatus field from the ReportingUnit element. I don't know all the uses of that field in that element, so I can't say definitively, but for my applications the CountStatus field in the ReportingUnit element introduced unnecessary complications.

And there's a logical problem with the CDF as is that would be solved by removing the CountStatus field from the ReportingUnit element: each Office has at most one ElectionDistrict. But if I'm using CountStatus, then a single Office could have several ElectionDistricts.

Concrete example: Here are two different ReportingUnits:

JDziurlaj commented 4 years ago

So is the goal to be able to store multiple versions of election results for a given type/reporting unit over a time period? I want to make sure I understand the use-case.

sfsinger19103 commented 4 years ago

Yes, one goal is to store multiple versions of election results for a given type/reporting unit over a time period.

Another use case is to handle a report where the counts of one type (say, election-day) have one status (say, ‘completed’) while counts of another type (say, ‘absentee-mail’) have another status (say, ‘in-process’).

—Stephanie

On Apr 14, 2020, at 8:34 AM, John Dziurlaj notifications@github.com wrote:

So is the goal to be able to store multiple versions of election results for a given type/reporting unit over a time period? I want to make sure I understand the use-case.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/usnistgov/ElectionResultsReporting/issues/42#issuecomment-613513707, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADNQLHUICUKE3G5WR2MRK4LRMR7AVANCNFSM4LFFMFMA.

JDziurlaj commented 4 years ago

My feeling is this addition does not fit within one of the existing use-cases of the Election Results Reporting CDF, which tend to be centered around activities of an election jurisdiction. That's not to say the use-case could not be added, but this change is breaking and would need to be considered for ERR vNext (3.0).

raylutz commented 4 years ago

I'm not certain but perhaps the issue of taking a snapshot at different periods of time could be simply handled by producing multiple reports as time passes.

sfsinger19103 commented 4 years ago

“This change is breaking” sounds bad. What does it mean? What breaks?

What’s the recommendation for handling the issue of each office having a single election district, in which one might want to combine completed election-day counts with in-progress absentee counts? That could happen in a single interim report, to give a complete-as-possible count.

johnpwack commented 4 years ago

Breaking here means that modifying the UML would result in a new schema that, if validating to it, would break existing implementations that need to validate to the previous schema. Thus it might be better to wait for a period of time before releasing a new version; during that period, other changes might arise and so forth. Given that one could produce multiple reports as time passes, and if I understand everything, my opinion is it would be better to wait.

I'm seeing the issue of having multiple types of counts associated with the office as straightforward to me, but I'll defer to John here.

sfsinger19103 commented 4 years ago

Having multiple types of counts associated to an office is not a problem; having multiple election districts associated to an office is a problem. In the current CDF, the state of California with counts in progress is a different reporting unit from the state of California with counts completed. Which is the Election District for the Governor of California?

sfsinger19103 commented 4 years ago

Thanks to a helpful conversation with @JDziurlaj , I think I understand an excellent reason for attaching the CountItemStatus to the ReportingUnit rather than the Count element: because the CountItemStatus is the same for all Selections, it is a wasteful of memory to keep the CountItemStatus with each Count rather than with an element that is independent of the Selection element. Makes sense. Follow-up question: doesn't the reasoning about CountItemStatus apply also to CountItemType? In other words, wouldn't it be less wasteful of memory to have a CountType field in the ReportingUnit element, rather than in the Count element?

JDziurlaj commented 4 years ago

The Counts::Type attribute ( of CountyItemType) is to associate a particular count with the reporting bucket it should be associated with. Since this data is about the particular count, it is not duplicative.

sfsinger19103 commented 4 years ago

I still don't understand the logic that dictates treating CountItemStatus differently from CountItemType.

It’s a reasonable choice to construe CountItemType as data about a particular Count; why isn’t it just as reasonable to construe CountItemStatus as data about a particular Count?

Maybe I’m misunderstanding “duplicative”. All the many Counts from a single set of ballots of a particular type (e.g., absentee-mail) will all have the same CountItemType. So there are many VoteCounts of the same VoteType — at least one for each selection on the ballot. This is a duplication of data that could be avoided by adding a CountItemType [0..*] attribute to the ReportingUnit class. What advantage makes this duplication of data worthwhile?

jdmgoogle commented 4 years ago

Quick question: is this for V2 of the spec?

sfsinger19103 commented 4 years ago

Yes, this is for V2 of the spec.

jdmgoogle commented 4 years ago

(This is probably overly long since I'm thinking "out loud" and maybe not directly addressing some of the questions since it's a long thread, so (a) apologies, and (b) please bear with me if I'm off-base...)

I thought there is a CountItemStatus and CountItemType on the ReportingUnit? There's a CountStatus which has both a status and a type. Contests also have a CountStatus.

The short answer (I think) is that the spec allows the representation of the data in multiple different ways. This is good for providers because it allows them to represent complex realities in ways that feel natural; it's challenging for consumers because of the same reasons. For example, let's say there's the following situation:

How should that be represented? E.g., should the state say that all mail-in absentee ballots state-wide are "in-progress" when only one contest is affected, and then each individual contest overrides the mail-in status? Or should the state say "completed" for state-wide mail-in absentee and only the Presidential contest overrides it?

As for the individual VoteCounts, they do have a CountType, and there are multiple VoteCounts associated with each BallotSelection. This is to enable multiple levels of reporting possible for a single selection (e.g., the number of early votes Trump/Pence got state-wide, the number of early votes they got in county X, the number of election day votes they got in county Y, etc. Each one of those doesn't also need a status since presumably the status for that type can be pulled from the GpUnit in the ID.

<!-- etc etc etc -->
<GpUnit xsd:type="ReportingUnit" objectId="pa-state">
  <Name>Snyder County</Name>
  <!-- The state considers everything to be "in progress"? Or should this be "completed"? -->
  <CountStatus>
    <Status>in-progress</Status>
    <Type>early</Type>
  </CountStatus>
  <CountStatus>
    <Status>in-progress</Status>
    <Type>election-day</Type>
  </CountStatus>
<GpUnit xsd:type="ReportingUnit" objectId="pa-snyder">
  <Name>Snyder County</Name>
  <!-- This county is done with early voting and election day is in-progress -->
  <CountStatus>
    <Status>completed</Status>
    <Type>early</Type>
  </CountStatus>
  <CountStatus>
    <Status>in-progress</Status>
    <Type>election-day</Type>
  </CountStatus>
</GpUnit>
<GpUnit xsd:type="ReportingUnit" objectId="pa-burke">
  <!-- This county is in-progress with everything -->
  <Name>Burke County</Name>
  <CountStatus>
    <Status>in-progress</Status>
    <Type>early</Type>
  </CountStatus>
  <CountStatus>
    <Status>in-progress</Status>
    <Type>election-day</Type>
  </CountStatus>
</GpUnit>
<!-- etc etc etc -->
<Contest xsd:type="CandidateContest" objectId="cc-pres">
  <BallotTitle>President of the United States</BallotTitle>
  <ElectionDistrictId>pa-state</ElectionDistrictId>
  <!--
  The count status could be optional. However, maybe there are disputed absentee opscan marks
  for this contest but not others. This version of the spec allows the state-level ReportingUnit
  object to say "in-progress" for absentee in general but the other contests (e.g., Governor) could
  tag their early vote counts as "completed".
  -->
  <CountStatus>
    <Status>in-progress</Status>
    <Type>early</Type>
  </CountStatus>
  <CountStatus>
    <Status>in-progress</Status>
    <Type>election-day</Type>
  </CountStatus>
  <!--
  Per-selection VoteCounts left as an exercise to the reader because I've been typing this up
  for 40 minutes already. :)
  -->
</Contest>