labordynamicsinstitute / qwi_schemas

Unofficial LEHD Schema files
https://lehd.ces.census.gov/data/schema/
Creative Commons Zero v1.0 Universal
1 stars 6 forks source link

Strange CIP Labels #134

Closed jodyhoonstarr closed 5 years ago

jodyhoonstarr commented 5 years ago

I'm seeing some extra text appended to the cip labels like "(Consolidated 51.00-51.99)". Did we do this or was it on the file? Can we strip it?

grep "(Consolidated" pseo_co.csv | tail -n 5 E,48,I,042087,07,51.XX,2011,5,2016,N,00,A,00,43519,56867,82440,43,1,"","","","",-1,"","","","",-1,Masters,"Health Professions and Related Programs (Consolidated 51.00-51.99)","COLORADO STATE UNIVERSITY - GLOBAL CAMPU",2011-2016 E,42,I,042087,07,51.XX,0000,0,2016,N,00,A,00,42905,56867,82440,43,1,"","","","",-1,"","","","",-1,Masters,"Health Professions and Related Programs (Consolidated 51.00-51.99)","COLORADO STATE UNIVERSITY - GLOBAL CAMPU","All Cohorts" E,48,I,042087,07,52.XX,2006,5,2016,N,00,A,00,52464,69311,96183,48,1,58334,76433,103078,43,1,"","","","",-1,Masters,"Business, Management, Marketing, and Related Support Services (Consolidated 52.01-52.99)","COLORADO STATE UNIVERSITY - GLOBAL CAMPU",2006-2011 E,48,I,042087,07,52.XX,2011,5,2016,N,00,A,00,49296,65982,95281,758,1,"","","","",-1,"","","","",-1,Masters,"Business, Management, Marketing, and Related Support Services (Consolidated 52.01-52.99)","COLORADO STATE UNIVERSITY - GLOBAL CAMPU",2011-2016 E,42,I,042087,07,52.XX,0000,0,2016,N,00,A,00,49554,66276,95203,806,1,58334,76433,103078,43,1,"","","","",-1,Masters,"Business, Management, Marketing, and Related Support Services (Consolidated 52.01-52.99)","COLORADO STATE UNIVERSITY - GLOBAL CAMPU","All Cohorts"

srt1 commented 5 years ago

We did it, and it was by design. This is weird - the tabulation is a mix of 2 and 4 digit levels, depending on degree level. When we do not do the 4-digit tabulation, we consolidate all of the 4-digits into a 2-digit tab with .XX appended. The label is trying to explain that.

Alternatively, we could simply use the standard 2-digit code; however, I would really prefer not change the industry_level variable to 2, rather than 4. The agg_level says this is a 4-digit tabulation, but it really is a mix of 2- and 4-digit ones. I really don't know the best answer here...

andrewfoote commented 5 years ago

@jodyhoonstarr We can strip it for the app, but not for the excel files, where it will be read in full.

jodyhoonstarr commented 5 years ago

After discussion Stephen and Andrew noted that it's fine to strip the '(consolidated ...)' text from the cip labels when using the cipcode as the identifier.