My goal is to read into R all the daily .xlsx files named "Community_Profile_Report_YYYYMMDD_Public.xlsx" (where YYYYMMDD is the date of the daily file). I'm trying to create R code that iterates through the filenames, reads in each file, selects certain columns from the "County" tab", and then performs some statistical analyses.
The goal would be to extract all the "assetIDs" and "filenames" and put them into a dataframe so that I could concatenate the assetIDs and filenames to create a list that of URLs that R could cycle through to read in the .xlsx files. I would then manipulate the columns to get a single flat file (in "stacked" or "long" form) with the county in one column, the date (nested in counties) in a second column, and then values for different variables (e.g., COVID-19 cases for a given county-date) in the reamaining columns.
Here's what code would look like to read in an .xlsx file for a single date:
API Docs: https://dev.socrata.com/foundry/healthdata.gov/6hii-ae4f
Hi-
My goal is to read into R all the daily .xlsx files named "Community_Profile_Report_YYYYMMDD_Public.xlsx" (where YYYYMMDD is the date of the daily file). I'm trying to create R code that iterates through the filenames, reads in each file, selects certain columns from the "County" tab", and then performs some statistical analyses.
All the .xlsx files are listed on this webpage: https://healthdata.gov/Health/COVID-19-Community-Profile-Report/gqxm-d9w9.
Here's the R code I have for reading in the .json file with the metadata file updates:
The dataframe "df" contains a long jumble of fields including "assetIDs" like this
and "filenames" like this:
Here's what a few lines from the dataframe look like, an essentially unintelligible jumble to me (but not to you!):
The goal would be to extract all the "assetIDs" and "filenames" and put them into a dataframe so that I could concatenate the assetIDs and filenames to create a list that of URLs that R could cycle through to read in the .xlsx files. I would then manipulate the columns to get a single flat file (in "stacked" or "long" form) with the county in one column, the date (nested in counties) in a second column, and then values for different variables (e.g., COVID-19 cases for a given county-date) in the reamaining columns.
Here's what code would look like to read in an .xlsx file for a single date:
I'd greatly appreciate any suggestions.
Thanks, David