ummel / fusionData

Data backend for fusionACS platform
https://ummel.github.io/fusionData/
GNU General Public License v3.0
2 stars 1 forks source link

Write code to process 2017 NHTS person data #36

Open ummel opened 2 years ago

ummel commented 2 years ago

Create new .R script /survey-processed/NHTS/2017/NHTS_2017_P_processed.R. See analogous ACS processing script for use as template for code development.

I think the only raw data feeding this script should come from the person-level NHTS data file. We want to keep the person-level data separate from the household data and retain maximum detail. The fusionData package is setup to smoothly handle both household- and person-level microdata from a given survey. In other words, there is no reason to prematurely/arbitrarily summarize person-level records at the household level.

Ensure that your new NHTS_2017_P_processed.R script contains the following code at the very end to commit the processed data dictionary and microdata to disk for year 2017.

# Create dictionary and save to disk
dictionary <- createDictionary(data = d, survey = "NHTS", vintage = 2017, respondent = "P")
saveRDS(object = dictionary, file = "survey-processed/NHTS/2017/NHTS_2017_P_dictionary.rds")
gc()

#----------------

# Save data to disk (.fst)
fst::write_fst(x = d, path = "survey-processed/NHTS/2017/NHTS_2017_P_processed.fst", compress = 100)
kar01123 commented 2 years ago

@ummel: Almost done with this part except for handling the trip/vehicle data. Do you think we should we merge the trip level data with the person files or create a separate file for trips? We need to update the create/compiledictionary functions to handle trip level data in that case.

ummel commented 2 years ago

The processed microdata need to be strictly household and/or person-level observations. I would summarize the trip data at household-level (since this seems most likely to be used, in practice) and merge with the household variables -- i.e. the data that will be saved as a H file.

kar01123 commented 2 years ago

Have pushed the files "NHTS_2017_P_processed.R" and the corresponding dictionary to survey-processed/NHTS/2017/ Also merged the trip and vehicle level data to the person level data since I thought it'd make analysis easier at a later stage. We could switch the merger to the household file if that makes more sense.