usnistgov / ElectionResultsReporting

Common data format specification for election results reporting data
https://pages.nist.gov/ElectionResultsReporting
Other
23 stars 8 forks source link

Proposed modification to xsd to clarify use of Count objects #8

Open markcwal-google opened 6 years ago

markcwal-google commented 6 years ago

This PR proposes some changes to simplify the use of the Counts objects. NOTE: this PR should not directly be submitted. The change is directly to the XSD, so is only meant to make the proposal easily understandable.

We propose the following:

  1. Splitting out "SummaryCounts" into "ErrorCounts" and "BallotCounts".
  2. "Election" will add a new reference to "BallotCounts", "Contest" will replace its reference to "SummaryCounts" with "ErrorCounts", "GpUnit" will remove its reference to "SummaryCounts"
  3. "GpUnitId" will be made required for the "Counts" object.

There are two items in particular which still need some discussion:

  1. "ErrorCounts" should be renamed to something more accurate, as "Undervotes" in most elections won't be considered an error.
  2. Should "WriteIn" be removed from "ErrorCounts"? It is already a valid "CountItemType" for the parent "Counts" object.
markcwal-google commented 6 years ago

@jdmgoogle

johnpwack commented 6 years ago

Hi Mark – I'm going to work today on updating the model according to my understanding of your comments, since I do better these days looking at the model than at the schema. I don't always guarantee my modeling work so Sam will have to have the last word. I agree with you that "error counts" is not the right term; I'll ask around and see if someone can suggest something else that does the job, I can't think of anything to use right off the bat. Thanks again for your comments, John

johnpwack commented 6 years ago

I've uploaded a revised UML model and picture which I think implements your changes, but I left writeins in the ballotcounts class for the time being. I am seeing the major difference being that it's no longer possible to associate the error counts with a gpunit such as a precinct nor a device, and that it does remove the possibility of a circular reference to gpunits. Is this what you intended?

markcwal-google commented 6 years ago

Yes, the removal of any "Counts" object as a child from GpUnit is intended -- as instead we made GpUnitId be a "required" field within the Counts object -- which hopefully will help clear up some of the ambiguity that Justin had pointed out.

johnpwack commented 6 years ago

Hi Mark, Justin, and Sam,

I talked to a few people yesterday who very much want the capability to associate a contest's ballot summary counts (overvotes, etc.) with precincts, etc., so we have to keep that capability. I have a couple of suggested changes to the V1 model that may help:

For reporting a contest's summary counts by its electoral district, use the existing associations: Contest->ElectoralDistrict (which is a role name for GpUnit).

For reporting a contest's summary counts by precinct, use the new association: Contest->GpUnit->SummaryCounts.

I would also include some of your previous suggestions:

I attached a picture - what do you think?

jdmgoogle commented 6 years ago

I talked to a few people yesterday who very much want the capability to associate a contest's ballot summary counts (overvotes, etc.) with precincts, etc., so we have to keep that capability.

Just FYI the proposal Mark posted does retain that capability. Overvotes, undervotes, and write-ins are in the probably-should-be-renamed-to-something-better ErrorCounts type, which (a) must reference a GpUnit, and (b) can be referenced multiple times from a Contest. E.g.,

<Contest xsi:type="CandidateContest" objectId="cc1">
  <BallotSelection>...</BallotSelection>
  <BallotSelection>...</BallotSelection>
  <ElectoralDistrictId>state1</ElectoralDistrictId>
  <ErrorCounts>
    <GpUnitId>state1</GpUnitId>
    <Type>errors</Type>  <!-- This is a new type. Or can be optional -->
    <Overvotes>4</Overvotes>
    <Undervotes>5</Undervotes>
    <WriteIns>6</Undervotes>
  </ErrorCounts>
  <ErrorCounts>
    <GpUnitId>precinct1</GpUnitId>
    <Type>errors</Type>
    <Overvotes>1</Overvotes>
    <Undervotes>2</Undervotes>
    <WriteIns>3</Undervotes>
  </ErrorCounts>
  <ErrorCounts>
    <GpUnitId>precinct2</GpUnitId>
    <Type>errors</Type>
    <Overvotes>3</Overvotes>
    <Undervotes>3</Undervotes>
    <WriteIns>3</Undervotes>
  </ErrorCounts>
</Contest>
johnpwack commented 6 years ago

Justin, apologies, I missed that ErrorCounts included GpUnit. Before I continue, I see an ambiguity in the model we should correct - Device ought to be renamed to something like DeviceType or DeviceClass, because it serves as a filter by a type of device on a count item. With the current name, I confuse it sometimes with ReportingDevice.

So, let me make sure I understand - in the current model, GpUnit-->SummaryCounts allows one to associate summary ballot counts directly with the geography represented by that GpUnit. GpUnit-->Counts(Type=SummaryCounts) allows one to filter the summary ballot counts for that geography by device type and count item type. And there lies the problem - Counts can reference GpUnit in a circular way and one needs to know NOT to do that. A schematron ruleset would help -- if used.

Also in the current model, Contest-->SummaryCounts allows one to report on how summary counts for a contest. Contest-->Counts(Type=SummaryCounts) allows one to filter the report by device type and count item type. Contest-->Counts(Type=SummaryCounts)-->GpUnit allows one to to filter by device type and count and also by geography. So, one can report on ballot summaries per contest, or per contest by geography.

By removing the association from GpUnit to SummaryCounts, it is no longer possible to associate summaries of ballot counts with a geography - unless one does this by reporting on summary counts for all contests for that geography, which would lead to ambiguities.

Do I have this right? If so, I know for sure we'd get complaints if we remove this capability.

To retain the current capability and also remove the possibility of the circular reference, one possibility is to make VoteCounts and SummaryCounts standalone classes, and make Counts a subclass that each class can include, renaming it to something like CountFilter, since that is mainly what it does. SummaryCounts does not need to include GpUnit, whereas VoteCounts does.

I could be entirely wrong - I've thought about this so much my brain hurts!

markcwal-google commented 6 years ago

I see an ambiguity in the model we should correct - Device ought to be renamed to something like DeviceType or DeviceClass, because it serves as a filter by a type of device on a count item. With the current name, I confuse it sometimes with ReportingDevice.

I agree with renaming this to something more descriptive. I like "DeviceClass"

So, let me make sure I understand - in the current model, GpUnit-->SummaryCounts allows one to associate summary ballot counts directly with the geography represented by that GpUnit. GpUnit-->Counts(Type=SummaryCounts) allows one to filter the summary ballot counts for that geography by device type and count item type. And there lies the problem - Counts can reference GpUnit in a circular way and one needs to know NOT to do that

Additionally, it allows you to report SummaryCounts (BallotsCast, BallotsRejected, etc.) at both the Contest and the GpUnit level. Even if the producer doesn't end up creating a circular dependency, it's very possible that they provide conflicting values at the Contest level vs. the GpUnit level. Or it's possible that certain producers only create Contest-level SummaryCounts, and others only produce GpUnit-level SummaryCounts. It ends up making it difficult to consume this data.

By removing the association from GpUnit to SummaryCounts, it is no longer possible to associate summaries of ballot counts with a geography - unless one does this by reporting on summary counts for all contests for that geography, which would lead to ambiguities.

I don't think I understand why this would lead to ambiguities. In the original proposal, "BallotsCast", "BallotsRejected", and "BallotsOutstanding" would be reported at the Election-level as part of an unbounded list of "BallotCount" objects. These can each be associated with a GpUnit via the "GpUnitId" field, so if ReportingUnit-level or ReportingDevice-level metrics are required, you could have one "BallotCount".

Justin and I were also thinking that by definition, Undervotes and Overvotes occur at the contest level. As Justin pointed out with his example, these can also be associated with a GpUnit via the "GpUnitId" field in "ErrorCounts" (inherited from "Counts"). If you wanted the total amount of overvotes for a specific GpUnitId, you could aggregate across the various contests, looking for ErrorCounts with that Id.

As you pointed out, currently it is possible to provide these counts at both the contest and the gpunit level, which creates an ambiguity on the consumer's standpoint of where to look when trying to report on device-level aggregations. What if the contest-level and the GpUnit-level stats conflict?

We're attempting to come up with a change to the schema that makes it clear from a producer/consumer standpoint of where to put these per-GpUnit and per-contest metrics. If our proposal doesn't address that point (and the point you raised about being able to associate ballot counts with a geography) then we should continue to iterate until we get it right. Perhaps we need to carve out some more time over a phone call to discuss further?