timriffe / covid_age

COVerAGE-DB: COVID-19 cases, deaths, and tests by age and sex
Other
56 stars 30 forks source link

Non-integer values for cases & deaths #145

Closed bernadette-eu closed 2 years ago

bernadette-eu commented 2 years ago

Hello,

I am interested in extracting mortality and incidence data for various European countries, for further analysis. I have noticed that the daily numbers for these metrics and be non-integer values.

What is the explanation for that? I provide my code for the case of Germany, with 10 year age groups.

Thanks, Lampros

Output_10 <- download_covid("Output_10", 
                            dest        = dest,
                            progress = TRUE)

country_selection <- "Germany"

test <- Output_10 %>%
        filter(Country == country_selection, 
               Region == "All",
               Sex    == "b",
               !is.na(Cases) ) %>%
        mutate(Date = dmy(Date)) 

2022-05-09_16h40_05

timriffe commented 2 years ago

Hi @bernadette-eu , when you see decimals, it means that the values represent estimates rather than reported values, which are usually integers. Estimates are always constrained to reported totals. Estimates arise whenever we need to harmonize age groups, for example. This is the case for Germany, which reports data in age groups wider than 10 years, and is likely the only explanation for your example. Some other situations include:

There are probably other cases, but all have in common that counts are constrained within the finest-possible categories.

In the future, we may begin calendar harmonization, so that date intervals are maximally comparable, and in this case, interpolated estimates would also have decimals.

I hope this satisfies your query.