This is great - Githubissues

bkmontgom commented 4 years ago

I'm an actuary and I find this very useful. Is this based entirely on South Korean data? How many deaths are included? I didn't see the data anywhere. I'm looking to add error bars.

yuryatin commented 4 years ago

Hi Brian,

Thank you for your feedback.

Data source I buried the URL to the dataset inside the Python script on the line 96 (which is https://www.kaggle.com/kimjihoo/coronavirusdataset#PatientInfo.csv). I avoid keeping the data, which legal status is not transparent to me. It is very difficult for me to find a case-by-case dataset from a country with widespread testing yet comparatively widespread disease. South Korea seems the best case overall, with Germany, and soon USA, may become second best alternatives. That dataset by yesterday evening had 2771 South Korean cases, in 53 of them the death had been unfortunately an outcome. In preprocessing (it can be followed in the Python script), I had to keep only 1723 cases with 43 death outcomes among them, due to lack of critical data for other cases (either the year of birth, or both the date of confirmation and date of first symptoms) and removing the most recent cases that hadn't had enough time to get exposed to the risk of death (this is described in the Python script).

Error bars Depending on your preference and your goals, you may want to select between the frequentist confidence paradigm or Bayesian probability of (essentially) probability (in this case). In the first case, as far as I understand, one needs to get the Fisher information for each parameter via calculating the second partial derivative of the (analytically non-expressible) multi-variable likelihood function (e.g., as described here on the page 4, equation 2.11 http://www.stat.umn.edu/geyer/s06/5102/notes/fish.pdf). In the second case, one needs a primer p.d.f. for probability (the risk of death) with the age as a parameter. A likelihood function for that primer distribution should be monotonically increasing. And it will need to be transformed into a posterior age-parameterized p.d.f. But I've never thought about how to make it.

Rigor If any insurance company decides to make any financial bets calculating the safety margins with a similar curve, it seems reasonable to first test other functions that, unlike most of the tested functions here, can, with an always positive first derivative in the domain from 0 to 120+ years, have a negative second derivative for the elderly ages, i.e., be potentially concave there.

yuryatin commented 4 years ago

The issue is closed.

yuryatin commented 4 years ago

It seems I need to do something first before closing this issue :-)

yuryatin / covid19_age_adjusted_mortality

This is great #1