timpyrkov / pynhanes

Python parser and scraper for NHANES
MIT License
3 stars 1 forks source link

2015 LMF NHANES #1

Open raminaghods opened 11 months ago

raminaghods commented 11 months ago

Hi, thank you for your code. I am working on NHANES dataset and your code really helped ease my data cleaning process. I notice your code reads the LMF file from 2015 which seems from your code gave public access to year of death as a variable. The current public LMF file on CDC website only has 2019 LMF file available which no longer has year of death for the variables. The website mentions they no longer will have the 2015 versions on the CDC website. I was hoping maybe you have those datasets for NHANES LMF from 2015 on your local machine and could also share those datasets with your code here? Thank you!

timpyrkov commented 11 months ago

Hi, thank you for pointing out that the LMF have been updtaed!

I've updated pynhanes code. Now pynhanes-0.0.19 can read any-year LMF data.

Yes, I have the 2015 LMF on dropbox link https://www.dropbox.com/scl/fo/7ny2z6seo4j2ho0ko2zo3/h?rlkey=bp255m3ma8dntgrjy2v5scpog. However they do not have year of death either. This is why I use the field PERMTH_INT to get the number of follow-up months for survival analysis (pynhanes divides it by 12 to convert to years).

raminaghods commented 11 months ago

Thank you very much for your quick reply! And thank you for uploading the data. I have a couple of other questions if you don't mind helping me out with.

  1. I am new to survival analysis language so I apologize if this question is too preliminary. From what you said, I gather PERMTH_INT is the number of months from interview date till day of death if deceased (or Dec 31, 2019 if alive). Why are they described in person-months if the number of "persons" is only 1? I am usually familiar with person-months in the concept of amount of research effort that goes into a topic. So I was (most likely incorrectly) interpreting PERMTH_INT as the amount of research effort CDC had done since interview date for this subject. Could you maybe point me to some research, book, example or note that describes what person-month of follow up here refers to?

  2. How did you learn that the way to read the LMF was the following column names and widths: col_widths = [14, 1, 1, 3, 1, 1, 1, 4, 8, 8, 3, 3] col_names = ["SEQN", "ELIGSTAT", "MORTSTAT", "UCOD_LEADING", "DIABETES", "HYPERTEN", "DODQTR", "DODYEAR", "WGT_NEW", "SA_WGT_NEW", "PERMTH_INT", "PERMTH_EXM"]

Thank you,