Closed cjerdonek closed 4 years ago
@cjerdonek in other instances where there is a one to many relationship in the current VIP schema, an additional relational csv file is collected. For instance, precinct_split
can be related to >=0 electoral_district
elements. The CSV for capturing those relationships are through the precinct_split_electoral_district
relational file that contains precinct_split_id
and electoral_district_id
. IMO, this would seem to be the most efficient way of capturing other one to many or many to many relationships in flat file formats.
So just to clarify, is it the case that if something appears in the XML spec, it needs to be supported via flat files? Or are some features supportable only via the XML?
@cjerdonek ideally all elements would be captured in csv and translated into the XML schema. The use case is for election officials to be able to easily produce data sets that can be transformed into the standard from disparate systems that may not have an RDBMS at their core.
Paul is on the mark -- and that's why, for example, the new way of handling apartments is complicated from the CSV perspective.
I would take a stronger stance than his "ideally": every element in the XML should be attainable from conversion of specified CSVs. The spec is an evolving document, but we've found that's it's expensive in both time and money for officials to keep up with XML specification changes, which subsequently diminishes investment in the project. It's easier for them to modify CSV exports, and for the project to own conversion from CSV -> XML (and thereby account for XML specification changes).
We've been leaving CSV issues aside for the moment because it's impractical to try to design an improved CSV spec until the XML spec is settled, and we don't need to be producing 5.0 XML immediately. But we'll have to bring the CSVs back to a place where a CSV dataset describes a full XML dataset pretty soon.
I would take a stronger stance than his "ideally": every element in the XML should be attainable from conversion of specified CSVs.
Okay, we should probably say this somewhere like in the contributing doc or style guide doc.
@cjerdonek @pkoms @pstenbjorn Can we close this as a duplicate of #60 or should it be modified to just deal with the styleguide/contributing doc changes?
Hmm...retracting my last comment, since this deals with the CSV format itself. I think we may want to modify the issue to cover what the expect output is though (assuming the question's been answered).
My intention in filing this issue was to document whether CSV needs to be supported for each XML element. @pkoms answered this here.
So I would say this issue can be resolved by doing the following:
It's possible that (2) could also go in the style guide, but (2) feels more substantive (i.e. what should go in the specs rather than how). In other words, it has the feeling of a higher-level project requirement rather than a style, which is more superficial.
I think we can close this issue. We've made some good progress on getting the CSV documentation more identifiable in the YAML files providing a one true source for both the CSV and XML elements. See this PR: https://github.com/votinginfoproject/vip-specification/pull/376
There remain certain gaps between the CSV and XML specs, namely that the CSV spec does not support InternationalizedText
, but we've mentioned this gap in the documentation and so far haven't had any issues with CSV feed providing states looking to implement InternationaliedText
. That said, it should be supported by the CSV spec eventually, and I'll make another Issue for that.
Unless there's objection, I'll close this issue once we have a new Issue for support of InternationalizedText
in the CSV spec.
Tracking CSV implementation of InternationalizedText
here: https://github.com/votinginfoproject/vip-specification/issues/399
I have a question regarding what XML attributes need to be supported in flat files. The answer to this question should probably be reflected somewhere in the internal or public-facing docs (if it isn't already). This question also feeds into a larger question of how we should or do support
maxOccurs="unbounded"
in flat files, in general.Here is the question:
I noticed that it doesn't seem like we have a standard way of supporting in the flat files those elements with
maxOccurs="unbounded"
. For example, in version 3 of the spec, the Precinct object has three "unbounded" elements: early_vote_site_id, electoral_district_id, and polling_location_id. However, the HTML documentation of the Precinct object in the version 3 docs (i.e. what is currently displayed on the web site) doesn't seem to support these elements in the CSV. Here is the CSV header it documents for the Precinct object:I know that in this case, these elements are documented as "optional," but in our newest version of the spec, I believe we have potentially many types with multiple
maxOccurs="unbounded"
elements. The "Contact" type is one example, with the following unbounded elements: AddressLine, Email, Fax, Phone, Uri. Have we already worked out a way to support elements like these in the flat files in version 5?