labordynamicsinstitute / qwi_schemas

Unofficial LEHD Schema files
https://lehd.ces.census.gov/data/schema/
Creative Commons Zero v1.0 Universal
1 stars 6 forks source link

Add MSA FIPS code to geohi (csv naming schema), remove label_geography_cbsa.csv #66

Closed srt1 closed 6 years ago

srt1 commented 7 years ago

In section 6.2, the "geohi" can be the 5-digit FIPS code for the MSA.

heathhayward commented 7 years ago

The 5 digit FIPS should be added to https://lehd.ces.census.gov/data/schema/V4.2b-draft/label_fipsnum.csv. Is the logic that the metro-area files will have the 5-digit characters in them? https://lehd.ces.census.gov/data/schema/V4.2b-draft/naming_geohi.csv is described as containing alphabetic FIPS codes, not numeric, so the format or description of this file would need to change.

srt1 commented 7 years ago

There is one file per metro area, per data product (J2J, J2JR, J2JOD). These are collected in the /metro directory. Where it otherwise has "us" or state postal code, it will contain the 5-digit FIPS code. It will also have "sarhe", which is described elsewhere in the schema.

I defer to others on how to describe this, and at what point to implement it (4.2b or 4.2c). It's in the files I created for the release.

larsvilhuber commented 7 years ago

CBSA codes do not belong into fipsnum.csv, they are already listed in https://lehd.ces.census.gov/data/schema/V4.2b-draft/label_geography_cbsa.csv, but HAH! it's not described anywhere on the page...

Sigh.

P.s. CBSA codes are not FIPS codes. We only refer to them as 5-digit CBSA code for metropolitan areas provided by the Census Bureau’s Geography Division

So it looks like we need:

-- Lars Vilhuber, Economist Cornell University, Executive Director, Labor Dynamics Institute and ILR School - Department of Economics

e: lars.vilhuber@cornell.edu p: +1.607-330-5743 v: https://cornell.zoom.us/my/larsvilhuber w: http://lars.vilhuber.com/ http://lars.vilhuber.com/

Assistant: ldi@cornell.edu | +1.607-255-2744

GnuPG Fingerprint: 0D7D 527F 9268 F693 74BB A666 FD01 37F0 3362 7346


From: srt1 notifications@github.com Sent: Friday, September 1, 2017 12:05:22 PM To: labordynamicsinstitute/qwi_schemas Cc: Lars Vilhuber; Assign Subject: Re: [labordynamicsinstitute/qwi_schemas] Add MSA FIPS code to geohi (csv naming schema) (#66)

There is one file per metro area, per data product (J2J, J2JR, J2JOD). These are collected in the /metro directory. Where it otherwise has "us" or state postal code, it will contain the 5-digit FIPS code. It will also have "sarhe", which is described elsewhere in the schema.

I defer to others on how to describe this, and at what point to implement it (4.2b or 4.2c). It's in the files I created for the release.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/labordynamicsinstitute/qwi_schemas/issues/66#issuecomment-326619802, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AGsoeJF-3c_A6OVfA4iVxwXTpEMVBnOmks5seCtCgaJpZM4PKT-7.

larsvilhuber commented 7 years ago

What is the difference between label_geography_metro.csv and label_geography_cbsa.csv?

Please let me know which file is useful/being used. It looks like label_geography_cbsa.csv is a straight dump from geography, whereas label_geography_metro.csv is what is de facto used by J2J and the file naming convention.

larsvilhuber commented 7 years ago

P.S. label_geography_cbsa.csv appeared for the first time in V4.1d-draft

larsvilhuber commented 7 years ago

OK, difference between _metro and _cbsa:

Reference to metro is in public_use_schema, reference to cbsa is in shapefiles.

Turning to geohi now...

heathhayward commented 7 years ago

I think we added the "_cbsa" file to the schema for J2J before we knew that J2J was going to be metro only. I vote to remove that file since the two iterations used in our data products are covered by the "metro" file ("B" for J2J) and the "[ST]" files ("M" for QWI). The "_cbsa" file is confusing and doesn't reflect anything in our data (that I know of). Lars are you ok with dropping it?

larsvilhuber commented 7 years ago

do you use it in the shapefiles?

-- Lars Vilhuber, Economist Cornell University, Executive Director, Labor Dynamics Institute and ILR School - Department of Economics

e: lars.vilhuber@cornell.edu p: +1.607-330-5743 v: https://cornell.zoom.us/my/larsvilhuber w: http://lars.vilhuber.com/ http://lars.vilhuber.com/

Assistant: ldi@cornell.edu | +1.607-255-2744

GnuPG Fingerprint: 0D7D 527F 9268 F693 74BB A666 FD01 37F0 3362 7346


From: heathhayward notifications@github.com Sent: Thursday, September 7, 2017 2:39:06 PM To: labordynamicsinstitute/qwi_schemas Cc: Lars Vilhuber; Assign Subject: Re: [labordynamicsinstitute/qwi_schemas] Add MSA FIPS code to geohi (csv naming schema) (#66)

I think we added the "_cbsa" file to the schema for J2J before we knew that J2J was going to be metro only. I vote to remove that file since the two iterations used in our data products are covered by the "metro" file ("B" for J2J) and the "[ST]" files ("M" for QWI). The "_cbsa" file is confusing and doesn't reflect anything in our data (that I know of). Lars are you ok with dropping it?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/labordynamicsinstitute/qwi_schemas/issues/66#issuecomment-327887596, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AGsoeA2-Q9C65XsiIKt1B22ULzCJiU_Mks5sgDhKgaJpZM4PKT-7.

heathhayward commented 7 years ago

If we edit 5.2.2 and 5.2.3 to reference label_geography_metro.csv instead of label_geography_cbsa.csv then we can get rid of the _cbsa file from the schema. shazzaaaaamm

heathhayward commented 7 years ago

we don't use it in the shapefiles

larsvilhuber commented 7 years ago

Shall we update the _metro file with the "Metropolitan Statistical Area" addition to the label?

-- Lars Vilhuber, Economist Cornell University, Executive Director, Labor Dynamics Institute and ILR School - Department of Economics

e: lars.vilhuber@cornell.edu p: +1.607-330-5743 v: https://cornell.zoom.us/my/larsvilhuber w: http://lars.vilhuber.com/ http://lars.vilhuber.com/

Assistant: ldi@cornell.edu | +1.607-255-2744

GnuPG Fingerprint: 0D7D 527F 9268 F693 74BB A666 FD01 37F0 3362 7346


From: heathhayward notifications@github.com Sent: Thursday, September 7, 2017 2:45:35 PM To: labordynamicsinstitute/qwi_schemas Cc: Lars Vilhuber; Assign Subject: Re: [labordynamicsinstitute/qwi_schemas] Add MSA FIPS code to geohi (csv naming schema) (#66)

we don't use it in the shapefiles

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/labordynamicsinstitute/qwi_schemas/issues/66#issuecomment-327889280, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AGsoeJ42cYb6XkkeKacIyIAk2V7FO8RBks5sgDnOgaJpZM4PKT-7.

heathhayward commented 7 years ago

Can we not? This would mean that we would have labels like: "Not in metropolitan area, AL, Metropolitan Statistical Area".

larsvilhuber commented 7 years ago

Where it makes sense.... the remainders are well defined by (slight mod)

""Not in any metropolitan area, AL"

Your call, though. The cbsa file had those labels, and viewing the metro file as a strict subset of CBSA codes, and sticking to them as closely as possible is preferable. But if it messes up the web interface, then let's not.

Lars

-- Lars Vilhuber, Economist Cornell University, Executive Director, Labor Dynamics Institute and ILR School - Department of Economics

e: lars.vilhuber@cornell.edu p: +1.607-330-5743 v: https://cornell.zoom.us/my/larsvilhuber w: http://lars.vilhuber.com/ http://lars.vilhuber.com/

Assistant: ldi@cornell.edu | +1.607-255-2744

GnuPG Fingerprint: 0D7D 527F 9268 F693 74BB A666 FD01 37F0 3362 7346


From: heathhayward notifications@github.com Sent: Thursday, September 7, 2017 2:50:00 PM To: labordynamicsinstitute/qwi_schemas Cc: Lars Vilhuber; Assign Subject: Re: [labordynamicsinstitute/qwi_schemas] Add MSA FIPS code to geohi (csv naming schema) (#66)

Can we not? This would mean that we would have labels like: "Not in metropolitan area, AL, Metropolitan Statistical Area".

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/labordynamicsinstitute/qwi_schemas/issues/66#issuecomment-327890425, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AGsoeNRtyqDJxLwBL3Jqccr0XN1GDqKNks5sgDrWgaJpZM4PKT-7.

srt1 commented 7 years ago

I didn't even realize that the naming_geohi.csv file was even a thing - I just thought it was a short description, forgetting that there was an underlying file containing all of the possibilities. So we seem to be beyond my familiarity with this part of the schema. The ticket's goals were accomplished as far as I had originally requested, and beyond. So I'm happy when you guys are happy.

heathhayward commented 7 years ago

I vote for not adding them. Let's remove the label_geography_cbsa.csv file and call it a day

heathhayward commented 7 years ago

the geo_level B defines the MSA label. Similar to how the geo_level in the label_geography.csv file defines the other geographies (i.e we don't have "County" or "State" included in the labels for other geographies). So I think we are being consistent.

larsvilhuber commented 6 years ago

Ready to implement in 4.2

larsvilhuber commented 6 years ago

@heathhayward @srt1 : This had two components: naming of files (geohi) and presence of geography_cbsa (to be removed). We never implemented the first component.

Questions:

heathhayward commented 6 years ago

The universe of MSA codes in 'geohi' does and should include the '01999' state remainders. This matches with the filenames Stephen creates in https://lehd.ces.census.gov/data/j2j/R2017Q3/j2j/metro/, for example. So the naming_geohi.csv file that we've got on 4.2b-draft looks correct to me (https://lehd.ces.census.gov/data/schema/V4.2b-draft/naming_geohi.csv).

Bigger picture, the geography schema page is correct and complete from my perspective. The only thing we might want to consider changing is to include all of the 'label_geography_metro' rows in the 'label_geography.csv' file. In https://lehd.ces.census.gov/data/schema/V4.2b-draft/lehd_public_use_schema.html#_a_id_geography_a_geography, we do mention that this is a "a composite file containing all geocodes", so shouldn't we include the "B" geo_level" geographies here?