timriffe / covid_age

COVerAGE-DB: COVID-19 cases, deaths, and tests by age and sex
Other
56 stars 30 forks source link

ungroup::pclm estimates #3

Open mpascariu opened 4 years ago

mpascariu commented 4 years ago

The {ungroup} package is great at ungrouping, however I think we should pay attention to the resulted CFR curves.

In my tests using the New York data where I harmonized the age-bands (10y groups) I noticed that the CFR at young ages is overestimated compared with the values for other regions where no ungrouppig of the data was applied (ITA, ESP, KOR, NLD, ...). This might be due to the fact that in each age band the death counts are skewed to the right (towards old ages) but not necessarily the number of confirmed cases. I assume the infections to be more uniformly distributed.

It is difficult for me to believe that we can see something that different in NY, but not impossible. For now, I am attributing this different outcome to the pclm() limitation. Maybe Silvia has a solution.

At young ages we are talking about small values however considering the age-structure of the population these can become significant.

I will try to post here some figures to support my claim.

timriffe commented 4 years ago

Silvia is consulting with us, and we'll also examine this issue and pass on lessons, thanks for pointing it out.

On Thu, Apr 16, 2020 at 2:45 PM Marius D. Pascariu notifications@github.com wrote:

The {ungroup} package is great at ungrouping, however I think we should pay attention to the resulted CFR curves.

In my tests using the New York data where I harmonized the age-bands (10y groups) I noticed that the CFR at young ages is overestimated compared with the values for other regions where no ungrouppig of the data was applied (ITA, ESP, KOR, NLD, ...). This is due to the fact that in each age band the death counts are skewed to the right (towards old ages) but not necessarily the number of confirmed cases. I assume the infections to be more uniformly distributed.

It is difficult for me to believe that we can see something that different in NY, but not impossible. I am attributing this different outcome to the pclm() limitation. Maybe Silvia has a solution.

At young ages we are talking about small values however considering the age-structure of the population these can become significant.

I will try to post here some figures to support my claim.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/timriffe/covid_age/issues/3, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG43G5DA4RTVSHUJETLYVLRM34XFANCNFSM4MJT2G2A .

mpascariu commented 4 years ago

Capture

mpascariu commented 4 years ago

image

timriffe commented 4 years ago

It's tough to know re NYC. Can see if they have a once-off epi report with a more detailed table to see how that age range stacks up. The intervals are so wide we don't have much to go on. I'm using an NYC single age Jan 1 2020 projection from Cornell as offset for NYC. You?

mpascariu commented 4 years ago

Exactly.

I am using the US Census data.

The thing is that i am trying to determine an IFR (true CFR) using a quick-and-dirty method. For all the regions the results are just beautiful only NY looks more worse, pushing the upper bound of my confidence.

timriffe commented 4 years ago

An update: soe time ago I switched to setting lambda to 1e5 for all subsets. This benefits older ages, but lower lambda seems better for young ages. AIC optimized lambda is just somewhere in the middle. We've yet to do a large sample of AIC tests to get a distribution of optimal lambdas.

Re offsets: we've gather a great many 2020 offsets (official estimates or projections), but have yet to carry out our own projections to bring all dates in sync. We also need to go on a gathering spree to account for newly added populations. Just an FYI.