votinginfoproject / vip-specification

The Voting Information Project XML specification.
http://vip-specification.readthedocs.io/en/release/
Other
75 stars 30 forks source link

State in the docs that the CSV supports all of the XML #109

Closed cjerdonek closed 4 years ago

cjerdonek commented 9 years ago

I have a question regarding what XML attributes need to be supported in flat files. The answer to this question should probably be reflected somewhere in the internal or public-facing docs (if it isn't already). This question also feeds into a larger question of how we should or do support maxOccurs="unbounded" in flat files, in general.

Here is the question:

I noticed that it doesn't seem like we have a standard way of supporting in the flat files those elements with maxOccurs="unbounded". For example, in version 3 of the spec, the Precinct object has three "unbounded" elements: early_vote_site_id, electoral_district_id, and polling_location_id. However, the HTML documentation of the Precinct object in the version 3 docs (i.e. what is currently displayed on the web site) doesn't seem to support these elements in the CSV. Here is the CSV header it documents for the Precinct object:

name,number,locality_id,ward,mail_only,ballot_style_image_url,id

I know that in this case, these elements are documented as "optional," but in our newest version of the spec, I believe we have potentially many types with multiple maxOccurs="unbounded" elements. The "Contact" type is one example, with the following unbounded elements: AddressLine, Email, Fax, Phone, Uri. Have we already worked out a way to support elements like these in the flat files in version 5?

pstenbjorn commented 9 years ago

@cjerdonek in other instances where there is a one to many relationship in the current VIP schema, an additional relational csv file is collected. For instance, precinct_split can be related to >=0 electoral_district elements. The CSV for capturing those relationships are through the precinct_split_electoral_district relational file that contains precinct_split_id and electoral_district_id. IMO, this would seem to be the most efficient way of capturing other one to many or many to many relationships in flat file formats.

cjerdonek commented 9 years ago

So just to clarify, is it the case that if something appears in the XML spec, it needs to be supported via flat files? Or are some features supportable only via the XML?

pstenbjorn commented 9 years ago

@cjerdonek ideally all elements would be captured in csv and translated into the XML schema. The use case is for election officials to be able to easily produce data sets that can be transformed into the standard from disparate systems that may not have an RDBMS at their core.

pkoms commented 9 years ago

Paul is on the mark -- and that's why, for example, the new way of handling apartments is complicated from the CSV perspective.

I would take a stronger stance than his "ideally": every element in the XML should be attainable from conversion of specified CSVs. The spec is an evolving document, but we've found that's it's expensive in both time and money for officials to keep up with XML specification changes, which subsequently diminishes investment in the project. It's easier for them to modify CSV exports, and for the project to own conversion from CSV -> XML (and thereby account for XML specification changes).

We've been leaving CSV issues aside for the moment because it's impractical to try to design an improved CSV spec until the XML spec is settled, and we don't need to be producing 5.0 XML immediately. But we'll have to bring the CSVs back to a place where a CSV dataset describes a full XML dataset pretty soon.

cjerdonek commented 9 years ago

I would take a stronger stance than his "ideally": every element in the XML should be attainable from conversion of specified CSVs.

Okay, we should probably say this somewhere like in the contributing doc or style guide doc.

jungshadow commented 9 years ago

@cjerdonek @pkoms @pstenbjorn Can we close this as a duplicate of #60 or should it be modified to just deal with the styleguide/contributing doc changes?

jungshadow commented 9 years ago

Hmm...retracting my last comment, since this deals with the CSV format itself. I think we may want to modify the issue to cover what the expect output is though (assuming the question's been answered).

cjerdonek commented 9 years ago

My intention in filing this issue was to document whether CSV needs to be supported for each XML element. @pkoms answered this here.

So I would say this issue can be resolved by doing the following:

  1. Stating explicitly in the XML and CSV documentation that the CSV format supports all XML fields (or should support).
  2. Add to the contributing docs that changes to the XML spec should also be supported via CSV (at least prior to release).

It's possible that (2) could also go in the style guide, but (2) feels more substantive (i.e. what should go in the specs rather than how). In other words, it has the feeling of a higher-level project requirement rather than a style, which is more superficial.

afsmythe commented 4 years ago

I think we can close this issue. We've made some good progress on getting the CSV documentation more identifiable in the YAML files providing a one true source for both the CSV and XML elements. See this PR: https://github.com/votinginfoproject/vip-specification/pull/376

There remain certain gaps between the CSV and XML specs, namely that the CSV spec does not support InternationalizedText, but we've mentioned this gap in the documentation and so far haven't had any issues with CSV feed providing states looking to implement InternationaliedText. That said, it should be supported by the CSV spec eventually, and I'll make another Issue for that.

Unless there's objection, I'll close this issue once we have a new Issue for support of InternationalizedText in the CSV spec.

afsmythe commented 4 years ago

Tracking CSV implementation of InternationalizedText here: https://github.com/votinginfoproject/vip-specification/issues/399