labordynamicsinstitute / qwi_schemas

Unofficial LEHD Schema files
https://lehd.ces.census.gov/data/schema/
Creative Commons Zero v1.0 Universal
1 stars 6 forks source link

Section 6.18.2 questions: label_geography_division and label_fipsnum/label_stusps #139

Open srt1 opened 5 years ago

srt1 commented 5 years ago

A few issues here:

First, the _labelfipsnum.csv file is slightly inconsistent - the numeric FIPS codes have quotes around them on all of the records, except Wyoming. I am unclear why we put quotes around the numeric codes on this file and on _labelstusps.csv, while we don't on the _labelgeography*.csv's. Also, we put quotes around the state postal code on _labelstusps.csv, while we don't put quotes on the geo_level on _labelfipsnum, so I am not sure what the rule here. Generically, a csv file requires quotes only if the field value contains a comma, so these don't really need quotes at all.

Second issue, I'm not sure why the _labeldivision.csv file is included in this section as a separate file. I don't think the divisions are a FIPS standard, and may not really belong here. Metro and substate geography don't get a shout out in this section, I don't know why division does.

Third, I am a bit unclear on the intended distinction between _labelstusups and _labelfipsnum is - it seems to me that they could be collapsed onto a single file.

heathhayward commented 5 years ago

some of these files are created during Production and some are hand created. We should make the quoting methods that come out of those Production files the standard and try to make the hand-created files match that standard.

srt1 commented 5 years ago

Production standard (using PROC EXPORT, I think) is to quote only if required, on an item-by-item basis. that is, quote only if a comma is present in the field, otherwise do not quote.