labordynamicsinstitute / qwi_schemas

Unofficial LEHD Schema files
https://lehd.ces.census.gov/data/schema/
Creative Commons Zero v1.0 Universal
1 stars 5 forks source link

Reconcile ownercode/agg_level, add agg_level to QWI csv files #70

Open srt1 opened 7 years ago

srt1 commented 7 years ago

We would like to add agg_level to the QWI files, but we need to resolve some complications related to the ownercode variable.

One way we could approach it without adding any additional variables might be the folllowing (and we've spoken around this concept in the past, so just revisiting it a bit here):

This would give us a consistent framework for how to report the universe and the crossings. I think in the OPM beta tabulations we released we used A01, which is problematic, but that was just a beta release. It doesn't hurt us to adopt this paradigm. If we want to go with this, it would be pretty trivial to add the agg_level variable it to the QWI csv files, and it is already in the schema. We would also have to remove the "A01 Federal" from the schema, and change the current ownercode label to be "All state, local, private", or some such.

Alternatively, we could use this approach:

The distinction of the agg_level matters if we ever were to cross J2J by ownership. We would also need to have an explicit residual between the total (A00) and any parts (A05), let's call it (A0R) for now. And then if we added Federal workers into the J2J universe, we would have to use an alternate code for the total (B00), and then the components (A01, A05, B0R). The appropriate agg_level values would be used, depending on whether it is a total across all ownership categories, or if is by detailed ownership categories, however those detailed categories are defined.

Whatever we do, I think that somewhere in the schema we need to describe that the A00 code does NOT contain federal workers. We don't do that now, and I'm not sure how users are supposed to figure it out.

larsvilhuber commented 7 years ago

For one, we could

I suggest not defining residuals, but correctly describe the universe. It would be great if agg_level can be incorporated into it (different owners, same agg_levels seems confusing).

srt1 commented 7 years ago

The difficulty with keeping the agg_level and the ownership_code in synch is that in an origin-destination framework, mixed ownership codes are by definition in the same aggregation level. For example,

The following set of transitions all have the same agg_level code:

origin destination

All own. state

All own. federal

All own. private

The following set of transitions will have a different agg_level code, but again, all the same:

origin destination

private state

state private

local federal

etc...

In the QWI system, we could have a different agg_level for each ownership code (though this means I'd have to replicate all of the possible margin list several more times, possibly excluding the OD margins), but you can't treat the OD margins the same way.


From: Lars Vilhuber notifications@github.com Sent: Monday, October 30, 2017 4:33:58 PM To: labordynamicsinstitute/qwi_schemas Cc: Stephen R Tibbets (CENSUS/CES FED); Assign Subject: Re: [labordynamicsinstitute/qwi_schemas] Reconcile ownercode/agg_level, add agg_level to QWI csv files (#70)

For one, we could

I suggest not defining residuals, but correctly describe the universe. It would be great if agg_level can be incorporated into it (different owners, same agg_levels seems confusing).

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/labordynamicsinstitute/qwi_schemas/issues/70#issuecomment-340575464, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AVlyoWFXLe7gH5CX0A7nDuco1PVuXAelks5sxjK2gaJpZM4PVVcK.

larsvilhuber commented 6 years ago

@srt1 : any progress on thinking this out?

Just a note: "All own" -> state isn't actually a transition that makes sense to me.

srt1 commented 6 years ago

It's not by itself, but it's a potential subtotal from a more detailed tabulation that someone might have. All the possible higher levels need to be accounted for.


From: Lars Vilhuber notifications@github.com Sent: Friday, December 15, 2017 5:14:19 PM To: labordynamicsinstitute/qwi_schemas Cc: Stephen R Tibbets (CENSUS/CES FED); Mention Subject: Re: [labordynamicsinstitute/qwi_schemas] Reconcile ownercode/agg_level, add agg_level to QWI csv files (#70)

@srt1https://github.com/srt1 : any progress on thinking this out?

Just a note: "All own" -> state isn't actually a transition that makes sense to me.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/labordynamicsinstitute/qwi_schemas/issues/70#issuecomment-352125357, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AVlyoTrtiynT3pv-rSSyiQkfTGSjIMeMks5tAu87gaJpZM4PVVcK.

srt1 commented 6 years ago

Let me restate my thoughts on how I see the agg_level and ownership fields interacting, and assorted tasks for implementation:

I think this framework is a consistent way of presenting both concepts. I agree that mixing A00 and B01 within a tabulation would be really funky, and we shouldn't go that route. But otherwise, I think this is a reasonable path forward. For public facing tasks, for 4.2 we should redefine what A00 means (QWI and J2J); and perhaps in 4.3 we will add agg_level to QWIPU csv files (for 2018Q3 production?).

larsvilhuber commented 6 years ago

I disagree somewhat - the proposition has too many classifications.

The original intent of the ownership codes was to align with BLS. That failed for the A class.

We should do the following:

Then the B series adheres to BLS criteria (B=BLS). However, we won't use it (because we can't claim to have Fed ownership when we do not have USPS)

The C series is then the complete-for-LEHD ownership class:

ownercode,label C01,"Federal government" - not used until everything is complete

C02, state-government

C03, local gov't

C04,"reserved" C05,"Private" C06,"OPM Federal Employment" C07,"USPS Federal Employment" C10,"State + Local" C11,"State + Local + Private" C12,"State + Local + Private + OPM"

The taxonomy is not hierarchical, but I'm less worried about that. We do not use the B series until we can be comparable to the BLS. A series is only kept for historical comparability. The C series is the correct and complete one, and at some point, we switch over.

lars

-- Lars Vilhuber, Economist Cornell University, Executive Director, Labor Dynamics Institute and ILR School - Department of Economics

e: lars.vilhuber@cornell.edu p: +1.607-330-5743 v: https://cornell.zoom.us/my/larsvilhuber w: http://lars.vilhuber.com/ http://lars.vilhuber.com/

Assistant: ldi@cornell.edu | +1.607-255-2744

GnuPG Fingerprint: 0D7D 527F 9268 F693 74BB A666 FD01 37F0 3362 7346


From: srt1 notifications@github.com Sent: Monday, December 18, 2017 2:31:41 PM To: labordynamicsinstitute/qwi_schemas Cc: Lars Vilhuber; Assign Subject: Re: [labordynamicsinstitute/qwi_schemas] Reconcile ownercode/agg_level, add agg_level to QWI csv files (#70)

Let me restate my thoughts on how I see the agg_level and ownership fields interacting, and assorted tasks for implementation:

I think this framework is a consistent way of presenting both concepts. I agree that mixing A00 and B01 within a tabulation would be really funky, and we shouldn't go that route. But otherwise, I think this is a reasonable path forward. For public facing tasks, for 4.2 we should redefine what A00 means (QWI and J2J); and perhaps in 4.3 we will add agg_level to QWIPU csv files (for 2018Q3 production?).

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/labordynamicsinstitute/qwi_schemas/issues/70#issuecomment-352533754, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AGsoeG2Gc55UPT9Pxr8abptNx2k-k8POks5tBr2dgaJpZM4PVVcK.

srt1 commented 6 years ago

If you want to use that naming scheme for the ownership, how do you deal with agg_level?


From: Lars Vilhuber notifications@github.com Sent: Monday, December 18, 2017 3:01:48 PM To: labordynamicsinstitute/qwi_schemas Cc: Stephen R Tibbets (CENSUS/CES FED); Mention Subject: Re: [labordynamicsinstitute/qwi_schemas] Reconcile ownercode/agg_level, add agg_level to QWI csv files (#70)

I disagree somewhat - the proposition has too many classifications.

The original intent of the ownership codes was to align with BLS. That failed for the A class.

We should do the following:

Then the B series adheres to BLS criteria (B=BLS). However, we won't use it (because we can't claim to have Fed ownership when we do not have USPS)

The C series is then the complete-for-LEHD ownership class:

ownercode,label C01,"Federal government" - not used until everything is complete

C02, state-government

C03, local gov't

C04,"reserved" C05,"Private" C06,"OPM Federal Employment" C07,"USPS Federal Employment" C10,"State + Local" C11,"State + Local + Private" C12,"State + Local + Private + OPM"

The taxonomy is not hierarchical, but I'm less worried about that. We do not use the B series until we can be comparable to the BLS. A series is only kept for historical comparability. The C series is the correct and complete one, and at some point, we switch over.

lars

-- Lars Vilhuber, Economist Cornell University, Executive Director, Labor Dynamics Institute and ILR School - Department of Economics

e: lars.vilhuber@cornell.edu p: +1.607-330-5743 v: https://cornell.zoom.us/my/larsvilhuber w: http://lars.vilhuber.com/ http://lars.vilhuber.com/

Assistant: ldi@cornell.edu | +1.607-255-2744

GnuPG Fingerprint: 0D7D 527F 9268 F693 74BB A666 FD01 37F0 3362 7346


From: srt1 notifications@github.com Sent: Monday, December 18, 2017 2:31:41 PM To: labordynamicsinstitute/qwi_schemas Cc: Lars Vilhuber; Assign Subject: Re: [labordynamicsinstitute/qwi_schemas] Reconcile ownercode/agg_level, add agg_level to QWI csv files (#70)

Let me restate my thoughts on how I see the agg_level and ownership fields interacting, and assorted tasks for implementation:

I think this framework is a consistent way of presenting both concepts. I agree that mixing A00 and B01 within a tabulation would be really funky, and we shouldn't go that route. But otherwise, I think this is a reasonable path forward. For public facing tasks, for 4.2 we should redefine what A00 means (QWI and J2J); and perhaps in 4.3 we will add agg_level to QWIPU csv files (for 2018Q3 production?).

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/labordynamicsinstitute/qwi_schemas/issues/70#issuecomment-352533754, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AGsoeG2Gc55UPT9Pxr8abptNx2k-k8POks5tBr2dgaJpZM4PVVcK.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/labordynamicsinstitute/qwi_schemas/issues/70#issuecomment-352541821, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AVlyoVNfLydsO_pWw0eJhFzvtGN9FbYBks5tBsSrgaJpZM4PVVcK.

larsvilhuber commented 6 years ago

I don't know. Your description of agg_level has "manual" links to the ownership schema. You state (not embed in anything specific) that "universe is A00". That sounds like something external to the agg_level creation. Changing that statement to "universe is C10" seems trivial.

There's a longer discussion here about the official definition of a universe, and what's in our frame. Our universe is all jobs in the US. Our frame does not include certain subsections.

I suggest that agg_level -> QWI files (and this ticket) be deferred to beyond 4.2.

For 4.2, correcting the label of A00 is the simplest. That would be a different ticket.

Lars

P.S. A different way of doing this is to disentangle the detailed owner codes from the aggregated collections thereof. I.e., our Detailed ownercodes only define the different LEHD-specific sources (frames):

ownercode,label C02, state-government C03, local gov't C04,"reserved" C05,"Private" C06,"OPM Federal Employment" C07,"USPS Federal Employment"

and having the summarization of the different categories be part of agg_level. that is the equivalent to the age or sex categories, just more complex, because more combinations. But I think we would still want a succinct ownercode for the aggregations that then correspond to a particular agg_level (i.e., the C10 below), just because that is easier...

-- Lars Vilhuber, Economist Cornell University, Executive Director, Labor Dynamics Institute and ILR School - Department of Economics

e: lars.vilhuber@cornell.edu p: +1.607-330-5743 v: https://cornell.zoom.us/my/larsvilhuber w: http://lars.vilhuber.com/ http://lars.vilhuber.com/

Assistant: ldi@cornell.edu | +1.607-255-2744

GnuPG Fingerprint: 0D7D 527F 9268 F693 74BB A666 FD01 37F0 3362 7346


From: srt1 notifications@github.com Sent: Monday, December 18, 2017 3:06:38 PM To: labordynamicsinstitute/qwi_schemas Cc: Lars Vilhuber; Assign Subject: Re: [labordynamicsinstitute/qwi_schemas] Reconcile ownercode/agg_level, add agg_level to QWI csv files (#70)

If you want to use that naming scheme for the ownership, how do you deal with agg_level?


From: Lars Vilhuber notifications@github.com Sent: Monday, December 18, 2017 3:01:48 PM To: labordynamicsinstitute/qwi_schemas Cc: Stephen R Tibbets (CENSUS/CES FED); Mention Subject: Re: [labordynamicsinstitute/qwi_schemas] Reconcile ownercode/agg_level, add agg_level to QWI csv files (#70)

I disagree somewhat - the proposition has too many classifications.

The original intent of the ownership codes was to align with BLS. That failed for the A class.

We should do the following:

Then the B series adheres to BLS criteria (B=BLS). However, we won't use it (because we can't claim to have Fed ownership when we do not have USPS)

The C series is then the complete-for-LEHD ownership class:

ownercode,label C01,"Federal government" - not used until everything is complete

C02, state-government

C03, local gov't

C04,"reserved" C05,"Private" C06,"OPM Federal Employment" C07,"USPS Federal Employment" C10,"State + Local" C11,"State + Local + Private" C12,"State + Local + Private + OPM"

The taxonomy is not hierarchical, but I'm less worried about that. We do not use the B series until we can be comparable to the BLS. A series is only kept for historical comparability. The C series is the correct and complete one, and at some point, we switch over.

lars

-- Lars Vilhuber, Economist Cornell University, Executive Director, Labor Dynamics Institute and ILR School - Department of Economics

e: lars.vilhuber@cornell.edu p: +1.607-330-5743 v: https://cornell.zoom.us/my/larsvilhuber w: http://lars.vilhuber.com/ http://lars.vilhuber.com/

Assistant: ldi@cornell.edu | +1.607-255-2744

GnuPG Fingerprint: 0D7D 527F 9268 F693 74BB A666 FD01 37F0 3362 7346


From: srt1 notifications@github.com Sent: Monday, December 18, 2017 2:31:41 PM To: labordynamicsinstitute/qwi_schemas Cc: Lars Vilhuber; Assign Subject: Re: [labordynamicsinstitute/qwi_schemas] Reconcile ownercode/agg_level, add agg_level to QWI csv files (#70)

Let me restate my thoughts on how I see the agg_level and ownership fields interacting, and assorted tasks for implementation:

I think this framework is a consistent way of presenting both concepts. I agree that mixing A00 and B01 within a tabulation would be really funky, and we shouldn't go that route. But otherwise, I think this is a reasonable path forward. For public facing tasks, for 4.2 we should redefine what A00 means (QWI and J2J); and perhaps in 4.3 we will add agg_level to QWIPU csv files (for 2018Q3 production?).

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/labordynamicsinstitute/qwi_schemas/issues/70#issuecomment-352533754, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AGsoeG2Gc55UPT9Pxr8abptNx2k-k8POks5tBr2dgaJpZM4PVVcK.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/labordynamicsinstitute/qwi_schemas/issues/70#issuecomment-352541821, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AVlyoVNfLydsO_pWw0eJhFzvtGN9FbYBks5tBsSrgaJpZM4PVVcK.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/labordynamicsinstitute/qwi_schemas/issues/70#issuecomment-352543013, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AGsoeCk6aHWeEFi_UPWZT_WuX93rDChkks5tBsXNgaJpZM4PVVcK.

srt1 commented 6 years ago

Yes, this is a question for perhaps 4.3, and the minor edit to the label on A00 is all that we need to do for 4.2. I'm trying to keep the broader discussion moving towards a resolution, since I would like to add agg_level to the QWI output tables in (potentially) 4.3, and we need to be clear on what the concept is. Treating ownership as a special class that doesn't have a hierarchical structure makes it virtually impossible, or you need to repeat the set of aggregation levels for each kind of ownership, and I don't know what you do for O-D tables.


From: Lars Vilhuber notifications@github.com Sent: Monday, December 18, 2017 3:44:43 PM To: labordynamicsinstitute/qwi_schemas Cc: Stephen R Tibbets (CENSUS/CES FED); Mention Subject: Re: [labordynamicsinstitute/qwi_schemas] Reconcile ownercode/agg_level, add agg_level to QWI csv files (#70)

I don't know. Your description of agg_level has "manual" links to the ownership schema. You state (not embed in anything specific) that "universe is A00". That sounds like something external to the agg_level creation. Changing that statement to "universe is C10" seems trivial.

There's a longer discussion here about the official definition of a universe, and what's in our frame. Our universe is all jobs in the US. Our frame does not include certain subsections.

I suggest that agg_level -> QWI files (and this ticket) be deferred to beyond 4.2.

For 4.2, correcting the label of A00 is the simplest. That would be a different ticket.

Lars

P.S. A different way of doing this is to disentangle the detailed owner codes from the aggregated collections thereof. I.e., our Detailed ownercodes only define the different LEHD-specific sources (frames):

ownercode,label C02, state-government C03, local gov't C04,"reserved" C05,"Private" C06,"OPM Federal Employment" C07,"USPS Federal Employment"

and having the summarization of the different categories be part of agg_level. that is the equivalent to the age or sex categories, just more complex, because more combinations. But I think we would still want a succinct ownercode for the aggregations that then correspond to a particular agg_level (i.e., the C10 below), just because that is easier...

-- Lars Vilhuber, Economist Cornell University, Executive Director, Labor Dynamics Institute and ILR School - Department of Economics

e: lars.vilhuber@cornell.edu p: +1.607-330-5743 v: https://cornell.zoom.us/my/larsvilhuber w: http://lars.vilhuber.com/ http://lars.vilhuber.com/

Assistant: ldi@cornell.edu | +1.607-255-2744

GnuPG Fingerprint: 0D7D 527F 9268 F693 74BB A666 FD01 37F0 3362 7346


From: srt1 notifications@github.com Sent: Monday, December 18, 2017 3:06:38 PM To: labordynamicsinstitute/qwi_schemas Cc: Lars Vilhuber; Assign Subject: Re: [labordynamicsinstitute/qwi_schemas] Reconcile ownercode/agg_level, add agg_level to QWI csv files (#70)

If you want to use that naming scheme for the ownership, how do you deal with agg_level?


From: Lars Vilhuber notifications@github.com Sent: Monday, December 18, 2017 3:01:48 PM To: labordynamicsinstitute/qwi_schemas Cc: Stephen R Tibbets (CENSUS/CES FED); Mention Subject: Re: [labordynamicsinstitute/qwi_schemas] Reconcile ownercode/agg_level, add agg_level to QWI csv files (#70)

I disagree somewhat - the proposition has too many classifications.

The original intent of the ownership codes was to align with BLS. That failed for the A class.

We should do the following:

Then the B series adheres to BLS criteria (B=BLS). However, we won't use it (because we can't claim to have Fed ownership when we do not have USPS)

The C series is then the complete-for-LEHD ownership class:

ownercode,label C01,"Federal government" - not used until everything is complete

C02, state-government

C03, local gov't

C04,"reserved" C05,"Private" C06,"OPM Federal Employment" C07,"USPS Federal Employment" C10,"State + Local" C11,"State + Local + Private" C12,"State + Local + Private + OPM"

The taxonomy is not hierarchical, but I'm less worried about that. We do not use the B series until we can be comparable to the BLS. A series is only kept for historical comparability. The C series is the correct and complete one, and at some point, we switch over.

lars

-- Lars Vilhuber, Economist Cornell University, Executive Director, Labor Dynamics Institute and ILR School - Department of Economics

e: lars.vilhuber@cornell.edu p: +1.607-330-5743 v: https://cornell.zoom.us/my/larsvilhuber w: http://lars.vilhuber.com/ http://lars.vilhuber.com/

Assistant: ldi@cornell.edu | +1.607-255-2744

GnuPG Fingerprint: 0D7D 527F 9268 F693 74BB A666 FD01 37F0 3362 7346


From: srt1 notifications@github.com Sent: Monday, December 18, 2017 2:31:41 PM To: labordynamicsinstitute/qwi_schemas Cc: Lars Vilhuber; Assign Subject: Re: [labordynamicsinstitute/qwi_schemas] Reconcile ownercode/agg_level, add agg_level to QWI csv files (#70)

Let me restate my thoughts on how I see the agg_level and ownership fields interacting, and assorted tasks for implementation:

I think this framework is a consistent way of presenting both concepts. I agree that mixing A00 and B01 within a tabulation would be really funky, and we shouldn't go that route. But otherwise, I think this is a reasonable path forward. For public facing tasks, for 4.2 we should redefine what A00 means (QWI and J2J); and perhaps in 4.3 we will add agg_level to QWIPU csv files (for 2018Q3 production?).

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/labordynamicsinstitute/qwi_schemas/issues/70#issuecomment-352533754, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AGsoeG2Gc55UPT9Pxr8abptNx2k-k8POks5tBr2dgaJpZM4PVVcK.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/labordynamicsinstitute/qwi_schemas/issues/70#issuecomment-352541821, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AVlyoVNfLydsO_pWw0eJhFzvtGN9FbYBks5tBsSrgaJpZM4PVVcK.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/labordynamicsinstitute/qwi_schemas/issues/70#issuecomment-352543013, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AGsoeCk6aHWeEFi_UPWZT_WuX93rDChkks5tBsXNgaJpZM4PVVcK.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/labordynamicsinstitute/qwi_schemas/issues/70#issuecomment-352552216, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AVlyoZkkWL_F9yOFuEGn5cKLrsyyQ7Xmks5tBs67gaJpZM4PVVcK.

larsvilhuber commented 6 years ago

Split out #77