Closed vpnagraj closed 11 months ago
It doesn't look like we have to update the get_hdgov_hosp
function. The api has previous_dat_admission_influenza_confirmed data ending on 07/15/2023 and we are grabbing that as our flu.admits column. We are getting the same number of rows as the API.
The prep_hdgov_hosp
also looks to be performing as expected. It removed the incomplete week of 07/14/2023 as we have programed it to start weekly aggregations on a sun and end on a sat. Since the API doesn't have data published for that Saturday the function took off that week.
So we are getting the latest data from https://healthdata.gov/api/views/g62h-syeh/rows.csv
The site also mentioned that after Monday June 12th, 2023, the dataset will only be updated once a week on Fridays.
@dwill023 thanks for digging into this!
im also seeing that our get_hdgov_hosp() %>% prep_hdgov_hosp()
pipeline still returns data. however, like you said ... that data appears to now be lagged by a week.
take a look at the reprex below. in that case, we previously would have expected to have prepared data all the way through the most recent week (i.e., the week ending 2023-07-29 (saturday) if today is a monday with week ending 2023-08-05). we're seeing 1 week gap, which lines up with the messaging from CDC regarding overall changes to HHS reporting.
im not sure the best way to fix this at the moment. we will likely need to either 1) nowcast for the most recent week => train models with nowcasted data or 2) shift modeling back 1 week and forecast 5 weeks ahead to get to the 4 week (from the forecast date) horizon.
leaving this issue open to help us prepare for the 2023-24 season.
library(fiphde)
Sys.Date()
#> [1] "2023-07-31"
h <- get_hdgov_hosp(limitcols = TRUE)
#> 66593 rows retrieved from:
#> https://healthdata.gov/api/views/g62h-syeh/rows.csv
max(h$date)
#> [1] "2023-07-21"
h_weekly <- prep_hdgov_hosp(h, remove_incomplete = FALSE, min_per_week = 0)
#> Summarizing to epiyear/epiweek
#> Trimming to 2020-10-18
#> Filtering to US+DC+States only
#> Removing states with < 0 flu.admits per week on average over the last month
#> Removed 0 states:
max(h_weekly$week_end)
#> [1] "2023-07-22"
Created on 2023-07-31 with reprex v2.0.2
this API (and our data retrieval function) is working as documented, albeit with the delays noted above.
closing this issue
our data retrieval for flu hospitalization data was originally written to retrieve daily counts from the HHS Protect data reported via healthdata.gov API and convert to weekly incidence:
https://signaturescience.github.io/fiphde/reference/get_hdgov_hosp.html
https://signaturescience.github.io/fiphde/reference/prep_hdgov_hosp.html
the reporting requirements and cadence has changed.
we need to validate that our data retrieval is working as expected.
questions to answer:
@dwill023 i am assigning you to take a look at this. use the thread in this issue to communicate what you find / address any other questions that you have along the way.