labordynamicsinstitute / qwi_schemas

Unofficial LEHD Schema files
https://lehd.ces.census.gov/data/schema/
Creative Commons Zero v1.0 Universal
1 stars 5 forks source link

J2J - Update version* files to provide separate date ranges for FA/FS #149

Open srt1 opened 5 years ago

srt1 commented 5 years ago

version files for J2J will have multiple lines to reflect separate date ranges for FA/FS, which may be truncated. We will modify section 8.1 of the schema accordingly.

jodyhoonstarr commented 5 years ago

@srt1

srt1 commented 5 years ago

From: Stephen R Tibbets (CENSUS/CES FED) Sent: Wednesday, August 28, 2019 4:41 PM To: lars.vilhuber@cornell.edu lars.vilhuber@cornell.edu; Patrick Hayward (CENSUS/CES FED) Patrick.Hayward@census.gov; Matthew Graham (CENSUS/CES FED) Matthew.Graham@census.gov; Joyce Key Hahn (CENSUS/CES FED) joyce.key.hahn@census.gov; Kevin Liu (CENSUS/ESMD FED) kevin.liu@census.gov; Jody Alexander Hoon-Starr (CENSUS/CES FED) jody.alexander.hoon-starr@census.gov; Henry R Hyatt (CENSUS/CES FED) Henry.R.Hyatt@census.gov Cc: Erika McEntarfer (CENSUS/CES FED) Erika.McEntarfer@census.gov; Camille S Norwood (CENSUS/EAD FED) Camille.S.Norwood@census.gov; qwi_schemas@noreply.github.com qwi_schemas@noreply.github.com Subject: Truncating time series to FAS range - how to present in metadata/schema? (Redmine 2384, qwi_schemas #149))

All,

It was noted a little while back that the latest J2J release contained tabulations on firm age/size which extended beyond the FAS tabulations that we report in QWI. This occurred because the J2J tabulations did not censor the time series for the FAS range. Because J2J tabulations naturally lag QWI by one quarter, the FAS series should normally be releaseable over the full range of J2J data with the expected BDS delivery schedule. Unfortunately, the BDS data will soon be a full year late due to the LBD reengineering, and once the dust settles the timing of future deliveries may not hew to the previously expected one (especially when the Economic Census hits). Hence, J2J releases will need to truncate FAS tabulations to the FAS range, as QWI automatically does (now). This is relatively straightforward, but we should indicate it in the J2J metadata. My proposal is to emulate the QWI approach, as follows:

version_qwi.txt

:::::::::::::: QWI_F MI 26 2000:3-2018:4 V4.4.0 R2019Q3 qwipu_mi_20190828_1011 QWI_FA MI 26 2000:3-2017:4 V4.4.0 R2019Q3 qwipu_mi_20190828_1011 QWI_FS MI 26 2000:3-2017:4 V4.4.0 R2019Q3 qwipu_mi_20190828_1011

version_j2j.txt (current) :::::::::::::: J2J MI 26 2000:4-2018:1 V4.4.0 R2019Q1 j2jpu_mi_20190507_2248

version_j2j.txt (proposed) :::::::::::::: J2J_F MI 26 2000:4-2018:1 V4.4.0 R2019Q1 j2jpu_mi_20190507_2248 J2J_FA MI 26 2000:4-2017:4 V4.4.0 R2019Q1 j2jpu_mi_20190507_2248 J2J_FS MI 26 2000:4-2017:4 V4.4.0 R2019Q1 j2jpu_mi_20190507_2248

In other words, add records in the version file to indicate that FA/FS tables are limited to the end of 2017, while non-FAS tables may go through 2018.

A further point, there is an additional metadata file describing the available ranges for all regional pairs for the J2JOD tables. I propose ignoring this for purposes of the FAS release ranges, and simply informing users (or applications) that any table or series by FA/FS will be further truncated by the quarters indicated in version_j2j[od].txt. Alternatively, I think you could add separate variables to that table for FA start/end, FS start/end to the existing table - or worse, add rows on some additional dimensions, which to me makes a complicated table even worse. You can see the relevant schema sections 8.1 and 8.2 at these links:

https://lehd.ces.census.gov/data/schema/j2j_latest/lehd_public_use_schema.html#_version_metadata_for_qwi_and_j2j_files_version_txt

https://lehd.ces.census.gov/data/schema/j2j_latest/lehd_public_use_schema.html#_additional_metadata_for_j2jod_files_avail_csv

Please let me know if anyone has any better ideas on how to handle this; and particularly, any app-related concerns that may arise (Heath/Jody). Otherwise, I will just go with my proposed approach, and edit the schema accordingly.

Thanks much,

Stephen


From: Jody Hoon-Starr notifications@github.com Sent: Wednesday, August 28, 2019 4:36:05 PM To: labordynamicsinstitute/qwi_schemas qwi_schemas@noreply.github.com Cc: Stephen R Tibbets (CENSUS/CES FED) Stephen.R.Tibbets@census.gov; Mention mention@noreply.github.com Subject: Re: [labordynamicsinstitute/qwi_schemas] J2J - Update version* files to provide separate date ranges for FA/FS (#149)

@srt1https://github.com/srt1

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/labordynamicsinstitute/qwi_schemas/issues/149?email_source=notifications&email_token=AFMXFIKW5BBYNUCIPJKCUZ3QG3OTLA5CNFSM4IRSLWV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5MMPGQ#issuecomment-525911962, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFMXFIOFVTC357M2ZXJKNR3QG3OTLANCNFSM4IRSLWVQ.

srt1 commented 5 years ago

From: Stephen R Tibbets (CENSUS/CES FED) Stephen.R.Tibbets@census.gov Sent: Thursday, August 29, 2019 10:09 AM Subject: Re: Truncating time series to FAS range - how to present in metadata/schema? (Redmine 2384, qwi_schemas #149))

Yes, I thought about the -1 flag, too (no available data...). That would be correct from the perspective that the BDS data is not available, but using it in this way does somewhat complicate the usual usage of the flag. The -1 flag typically ties into the edge of the time series, such that the flag will vary within a cell across measures that require lagged or leading quarters of data. In this case, all measures can potentially be available in the very last quarter of FAS data, depending on the BDS lag, and then the following quarter will be all flagged -1. With the flag=5 approach, you could argue that we do have data we could use to generate FA/FS, it's just not good enough (does not meet Census Bureau publication standards); in the case of the earnings measures at the margin, we really don't have any data at all. There's nothing inherently wrong with either approach, though.

This does also highlight one possible issue in how "true zero" cells interact with lagged measures in the QWI application. I am calling a true zero a cell that does not exist in the PUF because nothing tabulated into it. When the user requests a table, I think the application flags the zeroes or missings by measure, depending on the offset from the end of the time series. For FAS tabulations, this may not be appropriate, because the lags should be calculated w.r.t. the end of the UI time series, not the end of the BDS time series. So, it might make sense to allow FA/FS tables to be reported through the end of the full time series there, as well, suppressing as appropriate.

I think for right now I will simply use the 5 flag for J2J, suppressing data that does not meet publication standards. That will be the most easily implementable solution for the next J2J release, whenever that its. We can always modify the approach in the future.

Thanks very much for the comment - more thoughts are welcome.


From: Lars Vilhuber lars.vilhuber@cornell.edu Sent: Wednesday, August 28, 2019 8:59:28 PM Subject: Re: Truncating time series to FAS range - how to present in metadata/schema? (Redmine 2384, qwi_schemas #149))

That sounds like a good and justifiable idea. The "publication standards" is that "BDS microdata" is not available. Might also suggest the "no available input data to compute" flag.

-- Lars Vilhuber, Economist Cornell University, Executive Director, Labor Dynamics Institute and ILR School - Department of Economics American Economic Association - Data Editor Journal of Privacy and Confidentiality - Managing Editor

e: lars.vilhuber@cornell.edu p: +1.607-330-5743 v: https://cornell.zoom.us/my/larsvilhuber w: http://lars.vilhuber.com/

Assistant: ldi@cornell.edu | +1.607-255-2744


From: Stephen R Tibbets (CENSUS/CES FED) Stephen.R.Tibbets@census.gov Sent: Wednesday, August 28, 2019 19:28 Subject: Re: Truncating time series to FAS range - how to present in metadata/schema? (Redmine 2384, qwi_schemas #149))

One other thought - not how we do it in QWI, but there is an alternate approach:

We can report records for all years and quarters, but suppress all data values (flag=5 - does not meet Census Bureau Publication standards) for cells that are tabulated by firm age and size. This is actually a bit easier for me to implement, for a number of reasons, and it could have an advantage that the application will be able to grab the same quarters of data whichever characteristics are being tabulated. We might not need to modify the schema, though we can indicate that FAS tables may end prior to the full J2J time series.

Just an option - please feel free to voice opinions.

Thanks,

Stephen