ihmeuw-msca / CurveFit

Generic curve fitting package with nonlinear mixed effects model
https://ihmeuw-msca.github.io/CurveFit/
GNU General Public License v3.0
191 stars 57 forks source link

Example #12

Closed philippemiron closed 4 years ago

philippemiron commented 4 years ago

I believe it would be a great addition to add an example with real data so people could use your model to forecast for other countries using the datasets available at CSSEGISandData/COVID-19.

jason-curtis commented 4 years ago

+1, AFAICT this is the only source code provided for the projections at https://covid19.healthdata.org/projections , which are being increasingly used throughout the country. Example usage would also provide increased transparency which is crucial to understanding and trusting the projections for the USA.

philippemiron commented 4 years ago

It's been two days now, I don't understand how this is not the priority # 1. Once we are able to reproduce the results, I'm sure many people will help to improve the readability of the code and test each of the different components.

saravkin commented 4 years ago

Thanks for the request -- we recognize the need for examples and will work on helping people understand use cases when we can.

mpf commented 4 years ago

+1. An example would be helpful. It doesn't have to be a full-fledged example. A small contrived example to illustrate the workflow would be sufficient.

dnola commented 4 years ago

Would really appreciate an example - I am trying to take this and apply it to some county level data, but can’t figure out how to use the code base.

Thank you!

ibm-cuyler commented 4 years ago

An example would great. Country level data seems to be the next step. An example data set that we could apply the code to would be helpful, and give us a good sense for what data we'd need when applying the models to other countries.

andrewcolemfd commented 4 years ago

Thanks to the IHME team. We appreciate what you all are doing. I agree with previous posters, further documentation would be invaluable in furthering our understanding of our own communities' needs.

ibm-cuyler commented 4 years ago

For sure - thank you very much to the IHME team. Great work, everybody.

emadubuko commented 4 years ago

Kudos to the IHME team. I have been comparing the daily statistics of actual data reported against your projections as published on https://covid19.healthdata.org/projections, and it has been quite close. I would want to fit this model to my country dataset and generate siimilar projections. Can you possibly provide data variable of input data for this model? Thanks

pfaris commented 4 years ago

Same for us – we’d like the data if possible.

Thanks

Peter

Peter Faris, PhD Director, Health Services Statistical and Analytic Methods Analytics (DIMR) Foothills Medical Centre 1403-29 St. NW Calgary, AB T2N 2T9

tel: 403-944-0705 Office: Room 1101, South Tower FMC

Alberta Health Services www.albertahealthservices.cahttp://www.albertahealthservices.ca/

[cid:image001.png@01D609A5.11D8E810]

From: emadubuko [mailto:notifications@github.com] Sent: Friday, April 03, 2020 9:43 AM To: ihmeuw-msca/CurveFit CurveFit@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [ihmeuw-msca/CurveFit] Example (#12)

Caution - This email came from an external address and may contain unsafe content. Ensure you trust this sender before opening attachments or clicking any links in this message.


Kudos to the IHME team. I have been comparing the daily statistics of actual data reported against your projections as published on https://covid19.healthdata.org/projections, and it has been quite close. I would want to fit this model to my country dataset and generate siimilar projections. Can you possibly provide data variable of input data for this model? Thanks

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/ihmeuw-msca/CurveFit/issues/12#issuecomment-608513562, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ALDDIQZ277HK2XMHYNJIBYLRKX7W3ANCNFSM4LYFDEJQ.


This message and any attached documents are only for the use of the intended recipient(s), are confidential and may contain privileged information. Any unauthorized review, use, retransmission, or other disclosure is strictly prohibited. If you have received this message in error, please notify the sender immediately, and then delete the original message. Thank you.

jason-curtis commented 4 years ago

In case folks haven't seen it yet, this is the pre-print paper with some level of detail on the methodology: https://www.medrxiv.org/content/10.1101/2020.03.27.20043752v1.full.pdf

philippemiron commented 4 years ago

In case folks haven't seen it yet, this is the pre-print paper with some level of detail on the methodology: https://www.medrxiv.org/content/10.1101/2020.03.27.20043752v1.full.pdf

Also this is the model appendix... Anyone understand how to calculate the covariates from the death count time serie ?

kheedanonymous commented 4 years ago

@philippemiron I would like to speak to you please ….my email is @KHEEDANONYMOUS456@GMAIL.COM

dhruvparamhans commented 4 years ago

In case folks haven't seen it yet, this is the pre-print paper with some level of detail on the methodology: https://www.medrxiv.org/content/10.1101/2020.03.27.20043752v1.full.pdf

Also this is the model appendix... Anyone understand how to calculate the covariates from the death count time serie ?

I have the same question. Been trying to understand it for some time to no avail.

exander77 commented 4 years ago

I would like to apply the model to the data from my country, would it be possible to supply some examples?

kheedanonymous commented 4 years ago

Hey alex would it be able if you contacted me @KHEEDANONYMOUS456@GMAIL.COM

On Sat, Apr 4, 2020, 01:57 Alexander Weps notifications@github.com wrote:

I would like to apply the model to the data from my country, would it be possible to supply some examples?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ihmeuw-msca/CurveFit/issues/12#issuecomment-608766921, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOULZPZDLJCYWKEM63I7PMDRKZSUPANCNFSM4LYFDEJQ .

7ayushgupta commented 4 years ago

@philippemiron the covariate that they have used in the paper is the "duration between when the threshold of the death rate (1e-15 in their paper) was crossed, and the day social distancing was implemented by the government (let's say a lockdown)". They have given only one covariate, but we can add more to the model.

We have tried working on it, and are trying to make some predictions. If all goes fine, I'll send in a PR with documentation updates here. Cheers!

exander77 commented 4 years ago

@7ayushgupta This is great! Keep up the good work.

kheedanonymous commented 4 years ago

Hey alexander please text @KHEEDANONYMOUS456@GMAIL.COM

On Sat, Apr 4, 2020, 13:59 Alexander Weps notifications@github.com wrote:

@7ayushgupta https://github.com/7ayushgupta This is great! Keep up the good work.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ihmeuw-msca/CurveFit/issues/12#issuecomment-609011499, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOULZP7RB7LPP7Y6I2JOMUTRK4HJLANCNFSM4LYFDEJQ .

exander77 commented 4 years ago

@gits-png I sent you an email.

kheedanonymous commented 4 years ago

@alexander Did you use KHEEDANONYMOUS456@GMAIL.COM COZ I CANT SEE IT

On Sat, Apr 4, 2020, 18:13 Alexander Weps notifications@github.com wrote:

@gits-png https://github.com/gits-png I sent you an email.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ihmeuw-msca/CurveFit/issues/12#issuecomment-609043216, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOULZP6B25QNXQP2LLSG4I3RK5FDNANCNFSM4LYFDEJQ .

exander77 commented 4 years ago

@gits-png Yes, also responded right now.

From: exander77@gmail.com

dhruvparamhans commented 4 years ago

@philippemiron the covariate that they have used in the paper is the "duration between when the threshold of the death rate (1e-15 in their paper) was crossed, and the day social distancing was implemented by the government (let's say a lockdown)". They have given only one covariate, but we can add more to the model.

Do you understand why the death rate threshold is 1e-15. The number in the paper is 0.31 per million. which is 10^{-7}.

We have tried working on it, and are trying to make some predictions. If all goes fine, I'll send in a PR with documentation updates here. Cheers!

Thank you for this. there is an additional example.py file that was added a few hours back. It gives a good starting point but the notation isnt clear. In particular, i am unable to understand what data_group is supposed to denote there.

philippemiron commented 4 years ago

@philippemiron the covariate that they have used in the paper is the "duration between when the threshold of the death rate (1e-15 in their paper) was crossed, and the day social distancing was implemented by the government (let's say a lockdown)". They have given only one covariate, but we can add more to the model.

Do you understand why the death rate threshold is 1e-15. The number in the paper is 0.31 per million. which is 10^{-7}.

We have tried working on it, and are trying to make some predictions. If all goes fine, I'll send in a PR with documentation updates here. Cheers!

Thank you for this. there is an additional example.py file that was added a few hours back. It gives a good starting point but the notation isnt clear. In particular, i am unable to understand what data_group is supposed to denote there.

Hi dhruvparamhans, would it be np.exp(-15) ~ 3.06e-07 and not 1x10**(-15).

philippemiron commented 4 years ago

@philippemiron the covariate that they have used in the paper is the "duration between when the threshold of the death rate (1e-15 in their paper) was crossed, and the day social distancing was implemented by the government (let's say a lockdown)". They have given only one covariate, but we can add more to the model.

Do you understand why the death rate threshold is 1e-15. The number in the paper is 0.31 per million. which is 10^{-7}.

We have tried working on it, and are trying to make some predictions. If all goes fine, I'll send in a PR with documentation updates here. Cheers!

Thank you for this. there is an additional example.py file that was added a few hours back. It gives a good starting point but the notation isnt clear. In particular, i am unable to understand what data_group is supposed to denote there.

Also, if you look at the data_frame in their example. The 'data_group' is all set as 'world'... so I guess this is used to retrieve row-data for a specific country/states.

Screen Shot 2020-04-04 at 12 13 14 PM
exander77 commented 4 years ago

Any ideas on how to input national data?

kheedanonymous commented 4 years ago

Not yet but i will update you if i do

On Sat, Apr 4, 2020, 20:47 Alexander Weps notifications@github.com wrote:

Any ideas on how to input national data?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ihmeuw-msca/CurveFit/issues/12#issuecomment-609064931, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOULZP4U75TDISQCSWSN7WLRK5XBZANCNFSM4LYFDEJQ .

exander77 commented 4 years ago
    independent_var  measurement_value  measurement_std  constant_one data_group
0              0.00                  0              0.1           1.0    czechia
1              0.15                  3              0.1           1.0    czechia
2              0.30                  5              0.1           1.0    czechia
3              0.45                  8              0.1           1.0    czechia
4              0.60                 19              0.1           1.0    czechia
5              0.75                 26              0.1           1.0    czechia
6              0.90                 32              0.1           1.0    czechia
7              1.05                 38              0.1           1.0    czechia
8              1.20                 63              0.1           1.0    czechia
9              1.35                 94              0.1           1.0    czechia
10             1.50                116              0.1           1.0    czechia
11             1.65                141              0.1           1.0    czechia
12             1.80                189              0.1           1.0    czechia
13             1.95                298              0.1           1.0    czechia
14             2.10                383              0.1           1.0    czechia
15             2.25                450              0.1           1.0    czechia
16             2.40                560              0.1           1.0    czechia
17             2.55                765              0.1           1.0    czechia
18             2.70                889              0.1           1.0    czechia
19             2.85               1047              0.1           1.0    czechia
20             3.00               1161              0.1           1.0    czechia
array([0.66666667, 1.        , 1.33333333])
array([[   2.25840555],
       [   2.69716202],
       [1765.66571494]])
exander77 commented 4 years ago

I tried to change the measurement data to the Czech Republic ones, it gave me some prediction, but it fails if I feed it more data than the original 21.

kheedanonymous commented 4 years ago

Actually dude am not a professional at coding but amma try harder .... Thats learning right

On Sat, Apr 4, 2020, 21:07 Alexander Weps notifications@github.com wrote:

independent_var  measurement_value  measurement_std  constant_one data_group

0 0.00 0 0.1 1.0 czechia 1 0.15 3 0.1 1.0 czechia 2 0.30 5 0.1 1.0 czechia 3 0.45 8 0.1 1.0 czechia 4 0.60 19 0.1 1.0 czechia 5 0.75 26 0.1 1.0 czechia 6 0.90 32 0.1 1.0 czechia 7 1.05 38 0.1 1.0 czechia 8 1.20 63 0.1 1.0 czechia 9 1.35 94 0.1 1.0 czechia 10 1.50 116 0.1 1.0 czechia 11 1.65 141 0.1 1.0 czechia 12 1.80 189 0.1 1.0 czechia 13 1.95 298 0.1 1.0 czechia 14 2.10 383 0.1 1.0 czechia 15 2.25 450 0.1 1.0 czechia 16 2.40 560 0.1 1.0 czechia 17 2.55 765 0.1 1.0 czechia 18 2.70 889 0.1 1.0 czechia 19 2.85 1047 0.1 1.0 czechia 20 3.00 1161 0.1 1.0 czechia array([0.66666667, 1. , 1.33333333]) array([[ 2.25840555], [ 2.69716202], [1765.66571494]])

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ihmeuw-msca/CurveFit/issues/12#issuecomment-609067596, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOULZP3VH7KZJ2ILFARZ26DRK5ZPFANCNFSM4LYFDEJQ .

kheedanonymous commented 4 years ago

Which text editor are you using ?

On Sat, Apr 4, 2020, 21:08 Alexander Weps notifications@github.com wrote:

I tried to change the measurement data to the Czech Republic ones, it gave me some prediction, but it fails if I feed it more data than the original 21.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ihmeuw-msca/CurveFit/issues/12#issuecomment-609067713, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOULZP3ZK2I4SMSFZ3SK3ADRK5ZSXANCNFSM4LYFDEJQ .

exander77 commented 4 years ago

I am using vim, but you can edit python in whichever editor you like. (Actually I do not recommend vim. :D)

kheedanonymous commented 4 years ago

Have you tried sublime

On Sat, Apr 4, 2020, 21:14 Alexander Weps notifications@github.com wrote:

I am using vim, but you can edit python in whichever editor you like. (Actually I do not recommend vim. :D)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ihmeuw-msca/CurveFit/issues/12#issuecomment-609068473, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOULZP7B36Y7IV7QKGZGHZLRK52ITANCNFSM4LYFDEJQ .

philippemiron commented 4 years ago

Hi,

I made this function that retrieves data from the John Hopkins' Github data set for a selected country.

import pandas as pd

base = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/'
confirmed = 'time_series_covid19_confirmed_'
death = 'time_series_covid19_deaths_'
recovered = 'time_series_covid19_recovered_'

def data_country(selected_country, dataset='confirmed'):
    """ return dataset timeseries for a selected country """

    #select the right database
    if dataset == 'confirmed':
        url = base+confirmed
    elif dataset == 'death':
        url = base+death
    elif dataset == 'recovered':
        url = base+recovered

    if selected_country != 'US':
        df = pd.read_csv(url+'global.csv').groupby(['Country/Region']).sum()
        df.drop(['Lat', 'Long'], axis=1, inplace=True)
        df = df.loc[selected_country]
    else:
        df = pd.read_csv(url+'US.csv').groupby('Country_Region').sum()
        df.drop(['UID', 'code3', 'FIPS', 'Lat', 'Long_'], axis=1, inplace=True)
        if dataset == 'death':
            df.drop(['Population'], axis=1, inplace=True)
        df = df.sum()
    return df.index, df.values

You can call it for the different countries : date, count = data_country('US', 'confirmed') # or Canada, etc. for confirmed cases

You can also get the death or recovered counts by changing the second argument to 'death' or 'recovered'.

I believe their example is not fully completed but will share a Notebook this afternoon with real data.

Cheers.

exander77 commented 4 years ago

I fed it with CZ data and nothing much sane:

    independent_var  measurement_value  measurement_std  constant_one data_group
0           0.00000                  0              0.1           1.0    czechia
1           0.09375                  3              0.1           1.0    czechia
2           0.18750                  5              0.1           1.0    czechia
3           0.28125                  8              0.1           1.0    czechia
4           0.37500                 19              0.1           1.0    czechia
5           0.46875                 26              0.1           1.0    czechia
6           0.56250                 32              0.1           1.0    czechia
7           0.65625                 38              0.1           1.0    czechia
8           0.75000                 63              0.1           1.0    czechia
9           0.84375                 94              0.1           1.0    czechia
10          0.93750                116              0.1           1.0    czechia
11          1.03125                141              0.1           1.0    czechia
12          1.12500                189              0.1           1.0    czechia
13          1.21875                298              0.1           1.0    czechia
14          1.31250                383              0.1           1.0    czechia
15          1.40625                450              0.1           1.0    czechia
16          1.50000                560              0.1           1.0    czechia
17          1.59375                765              0.1           1.0    czechia
18          1.68750                889              0.1           1.0    czechia
19          1.78125               1047              0.1           1.0    czechia
20          1.87500               1161              0.1           1.0    czechia
21          1.96875               1287              0.1           1.0    czechia
22          2.06250               1472              0.1           1.0    czechia
23          2.15625               1763              0.1           1.0    czechia
24          2.25000               2022              0.1           1.0    czechia
25          2.34375               2395              0.1           1.0    czechia
26          2.43750               2657              0.1           1.0    czechia
27          2.53125               2817              0.1           1.0    czechia
28          2.62500               3001              0.1           1.0    czechia
29          2.71875               3308              0.1           1.0    czechia
30          2.81250               3589              0.1           1.0    czechia
31          2.90625               3858              0.1           1.0    czechia
32          3.00000               4190              0.1           1.0    czechia
alpha: [2.20389855]
beta: [2.47338824]
p: [5352.08537721]

I am not sure what those params are alpha, beta, p? The prediction 5352? That is disappointing so far. I tried to tweak those params to undrstand, but no luck.

I would expect a prediction like 4500 or so.

exander77 commented 4 years ago

I managed to do some predictions by calling predict on the continuation of independent_var series:

    independent_var  measurement_value  measurement_std  constant_one data_group
0           0.00000                  0              0.1           1.0    czechia
1           0.09375                  3              0.1           1.0    czechia
2           0.18750                  5              0.1           1.0    czechia
3           0.28125                  8              0.1           1.0    czechia
4           0.37500                 19              0.1           1.0    czechia
5           0.46875                 26              0.1           1.0    czechia
6           0.56250                 32              0.1           1.0    czechia
7           0.65625                 38              0.1           1.0    czechia
8           0.75000                 63              0.1           1.0    czechia
9           0.84375                 94              0.1           1.0    czechia
10          0.93750                116              0.1           1.0    czechia
11          1.03125                141              0.1           1.0    czechia
12          1.12500                189              0.1           1.0    czechia
13          1.21875                298              0.1           1.0    czechia
14          1.31250                383              0.1           1.0    czechia
15          1.40625                450              0.1           1.0    czechia
16          1.50000                560              0.1           1.0    czechia
17          1.59375                765              0.1           1.0    czechia
18          1.68750                889              0.1           1.0    czechia
19          1.78125               1047              0.1           1.0    czechia
20          1.87500               1161              0.1           1.0    czechia
21          1.96875               1287              0.1           1.0    czechia
22          2.06250               1472              0.1           1.0    czechia
23          2.15625               1763              0.1           1.0    czechia
24          2.25000               2022              0.1           1.0    czechia
25          2.34375               2395              0.1           1.0    czechia
26          2.43750               2657              0.1           1.0    czechia
27          2.53125               2817              0.1           1.0    czechia
28          2.62500               3001              0.1           1.0    czechia
29          2.71875               3308              0.1           1.0    czechia
30          2.81250               3589              0.1           1.0    czechia
31          2.90625               3858              0.1           1.0    czechia
32          3.00000               4190              0.1           1.0    czechia
array([0.66666667, 1.        , 1.33333333])
alpha: [2.20389855]
beta: [2.47338824]
p: [5352.08537721]
array([4265.23591878, 4433.28700621, 4580.05696837, 4706.79430815,
       4815.1652726 , 4907.05709468, 4984.42281497, 5049.169184  ,
       5103.08314254, 5147.78958555, 5184.73256054, 5215.17278241,
       5240.19564488, 5260.72531385, 5277.54175749, 5291.29860217,
       5302.54048831, 5311.71916453, 5319.20794429, 5325.31440039,
       5330.29132715, 5334.34608776, 5337.64850755, 5340.33748911,
       5342.52652345, 5344.30825965, 5345.7582799 , 5346.9382086 ,
       5347.89826693, 5348.67936767, 5349.31483043, 5349.83178419])
exander77 commented 4 years ago

I basically fed my own data:

measurement_value = [
        0, 3, 5, 8, 19, 26, 32, 38, 63, 94, 116, 141, 189, 298, 383, 450, 560, 765, 889, 1047,
        1161, 1287, 1472, 1763, 2022, 2395, 2657, 2817, 3001, 3308, 3589, 3858, 4190]
n_data       = len(measurement_value)

And then call predictions on the calculated model:

predictions = curve_model.predict(t=independent_var+beta_true)
import pprint
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(predictions[1::])
dnola commented 4 years ago

How are we applying the covariates here? It looks like in the example we are just feeding in a constant one.

Additionally, how are we linking between data groups once we have those covariates determined per group?

Am I correct in thinking that we can just set that “constant one” field for each data group to the “time from threshold to social distancing” feature for that particular group?

Last, should “t” always be relative to the first detected death? Or the first death rate past a threshold? I.e. if we wanted to look at another country, would it be as simple as adding another set of rows starting from t=0 for that location for a new data group? Or does the timing across all series need to be aligned?

Ie if we wanted to look at both China and USA would we start both from t=0 at the time of their first cases, or does t start from first case in China, and first case in US would start at some much later t?

dnola commented 4 years ago

@exander77 With respect to the parameters alpha beta and p, those are the three parameters of a logistic curve. The p you are asking about is the “carrying capacity” of the logistic model - Ie the max. That isn’t your prediction, that is what your predictions will ultimately taper off at

(Alpha is the growth rate and beta determines the inflection point)

That said, if we don’t figure out the covariate linking as well as the errors from fixed and random effects we lose what makes this approach unique, and are basically just doing a simple logistic regression like you could get out of the box in sklearn. So we should work on that next

dnola commented 4 years ago

Also last question, has there or will there be any code or example released with respect to the simulation getting from death rate to hospital resource utilization?

philippemiron commented 4 years ago

From their publication.

A covariate of days with expected exponential growth in the cumulative death rate was created using information on the number of days after the death rate exceeded 0.31 per million to the day when 4 different social distancing measures were mandated by local and national government: school closures, non-essential business closures including bars and restaurants, stay-at-home recommendations, and travel restrictions including public transport closures. Days with 1 measure were counted as 0.67 equivalents, days with 2 measures as 0.334 equivalents and with 3 or 4 measures as 0.

I think I get what they did, but haven't obtain similar results yet. If I understand correctly. As a example:

The covariate would be: covariates = [0, 0, 1, 2, 2.66, 3.32, 3.98, 4.31, 4.64, 4.97, 4.97, 4.97, 4.97, 4.97, 4.97].

Here is a little code to generate this:

# fictionnal data
death_rate_over_threshold = 1
timeline_measure = {
  3: 1,
  6: 2,
  9: 4,
}

# 0 measure = 1, 1 measure = 2/3, 2 measures = 1/3, 3-4 measures = 0
day_count_as = [1, 0.66, 0.33, 0, 0]

# construct the covariates for the 15 days
covariates = np.zeros(15)
nb_measures = 0
for day in range(0, len(covariates)):
    if day > death_rate_over_threshold:
        covariates[day] = covariates[day-1] + day_count_as[nb_measures]

    # adjust the number of social distancing measure
    if day in timeline_measure.keys():
        nb_measures = timeline_measure[day]
print(covariates)

ps: this is my best understanding so far !

7ayushgupta commented 4 years ago

We did the same, but could not obtain good predictions.

They would have used a covariate model for Wuhan as well, do you know about that?

philippemiron commented 4 years ago

Sadly that's where I am right now.

thewanderer41 commented 4 years ago

@philippemiron the covariate that they have used in the paper is the "duration between when the threshold of the death rate (1e-15 in their paper) was crossed, and the day social distancing was implemented by the government (let's say a lockdown)". They have given only one covariate, but we can add more to the model.

Do you understand why the death rate threshold is 1e-15. The number in the paper is 0.31 per million. which is 10^{-7}.

The number 1e-15 is used because that is just larger than machine epsilon ie. the smallest number representable by a machine. Basically, we can't precisely store a value smaller than this using 64bits. If we tried, it's within measurement error and therefore invalid.

exander77 commented 4 years ago

I am kind of stumped that authors can't release their complete workflow so we can verify and reuse it. This is tedious reverse engineering work.

kheedanonymous commented 4 years ago

Hold up....so this engineering you guys doing .....the programm already exists ??

On Sun, Apr 5, 2020, 13:51 Alexander Weps notifications@github.com wrote:

I am kind of stumped that authors can't release their complete workflow so we can verify and reuse it. This is tedious reverse engineering work.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ihmeuw-msca/CurveFit/issues/12#issuecomment-609396829, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOULZP3J45OY5ID55IVQWCDRLBPBPANCNFSM4LYFDEJQ .

HiroakiMachida commented 4 years ago

Got data referring to @philippemiron and ran main.py @7ayushgupta. Still doesn't get an appropriate prediction.

https://github.com/HiroakiMachida/CurveFit/blob/master/main.py

(base) Hiroaki-no-MacBook:CurveFit hiroakimachida$ python main.py
0     Japan
      ...  
74    Japan
Name: State/UnionTerritory, Length: 75, dtype: object
Model pipeline setting up...
Model setup. Running fit...
Model fitted. Saving model...
Model saved.
Running PV for Japan
//anaconda3/lib/python3.7/site-packages/pandas/core/reshape/merge.py:938: UserWarning: You are merging on int and float columns where the float values are not equal to their int representation
  'representation', UserWarning)
[0.5        0.92250345 0.99777404 0.99999006 0.99999999 1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         1.         1.         1.         1.         1.        ]
7ayushgupta commented 4 years ago

@HiroakiMachida @philippemiron @thewanderer41 we can discuss and find out a solution based on the understanding of the code that we've got. Any suitable time and platform would be good for me. Let's do it urgently, and get some predictions.

dhruvparamhans commented 4 years ago

@philippemiron the covariate that they have used in the paper is the "duration between when the threshold of the death rate (1e-15 in their paper) was crossed, and the day social distancing was implemented by the government (let's say a lockdown)". They have given only one covariate, but we can add more to the model.

Do you understand why the death rate threshold is 1e-15. The number in the paper is 0.31 per million. which is 10^{-7}.

We have tried working on it, and are trying to make some predictions. If all goes fine, I'll send in a PR with documentation updates here. Cheers!

Thank you for this. there is an additional example.py file that was added a few hours back. It gives a good starting point but the notation isnt clear. In particular, i am unable to understand what data_group is supposed to denote there.

Hi dhruvparamhans, would it be np.exp(-15) ~ 3.06e-07 and not 1x10**(-15).

I think you are quite right. Now I feel quite stupid. The notation in the paper didnt help things. I remember reading 1e-15.

saravkin commented 4 years ago

Hi everyone!

We are working as fast as we can to support the analyses and update methodology. As we go, we are also picking up speed on documentation and examples. We expect to have an updated paper that documents major changes soon. Please keep checking the following websites: 1) Main projections: https://covid19.healthdata.org/projections 2) Updates and explanations of whats new: http://www.healthdata.org/covid/updates

We will post a link to updated paper in the main readme file when it posts, we are expecting end of day April 7th.

For specific locations and analyses please contact covid19@healthdata.org, so you can coordinate with the broader ihme team. The purpose of the repository is to share the program that is doing the estimation. The broader team at IHME processes the data, does age standardization, covariate definitions, and all analyses, which are then released online. The pipeline will be documented in the updated paper, and we will continue our work in documenting the CurveFit program.