lisphilar / covid19-sir

CovsirPhy: Python library for COVID-19 analysis with phase-dependent SIR-derived ODE models.
https://lisphilar.github.io/covid19-sir/
Apache License 2.0
109 stars 44 forks source link

[New] Relationship of OxCGRT index and parameter values (short-term prediction) #280

Closed Inglezos closed 3 years ago

Inglezos commented 3 years ago

What we need to document?

I am referring to https://lisphilar.github.io/covid19-sir/usage_policy.html, in specific to the (Experimental): Relationship of OxCGRT index and parameter values section. What all these results actually mean? Could you provide a more detailed documentation and analysis, about the OxCGRT index usage and how it affects the parameter values for each country, with examples and practical explanations?

lisphilar commented 3 years ago

This is just an experimental analysis to find the relasionship of the parameters and government responses. This is related to the discussion with @ilyasst and @joydisette in #3 They are the authors of https://ilylabs.github.io/projects/COVID-trackers/

We need to perform machine/deep learning. #204 and #205 are also related. We should have the dataset of parameter values to enhance this experimental analysis.

However, parameter estimation for all countries is a very time-consuming task. We have too many countries and the number of phases are incleasing every day.

225 must be solved in advance.

Inglezos commented 3 years ago

Yes, I think #225 must definitely be solved in advance. You mean machine learning for pattern recognition of the relationships? We don't have to do this for all the countries, but for a few at first.

I am referring to https://lisphilar.github.io/covid19-sir/usage_policy.html, in specific to the (Experimental): Relationship of OxCGRT index and parameter values section. What all these results actually mean? Could you provide a more detailed documentation and analysis, about the OxCGRT index usage and how it affects the parameter values for each country, with examples and practical explanations?

  • For example, what does that correlation table mean practically, how are these results interpreted and what's the results significance/impact on the general countries analyses?
  • The scatter plot at the bottom for every country, depicting the relationship between the Reproductive number (Rt) and the OxCGRT stringency index, what does it mean practically? Does it mean that for example for higher index values, the Rt is lower? The results have to be reworked/scaled, in order to display properly all the points around 0-100 zone and to ignore the high outliers.
  • So, what those results practically mean?
lisphilar commented 3 years ago

As it is and this is just an experimental analysis to find the solutions to predict the parameter values using government responses with deep learning.

They are for feature selection.

I do not know which index is necessary in the solution. Correlation table, scatter plot and deep learning is helpful to find the useful index.

Inglezos commented 3 years ago
Inglezos commented 3 years ago

For example, if a country applies quarantine measures on X day, we know that most probably the daily new cases will decrease in around 2-3 weeks after that X day. But the estimation analysis will give us increased cases for that period, since the parameters will be different and we simply cannot forecast this. Except if we somehow insert this index into the simulator analysis as an extra input parameter and affect the simulated cases. Because these simulated cases otherwise are not realistic and representative of our current knowledge that extra measures are in effect. We need to let the model know about that and the best way would be that index. Do you have any ideas how something like this could be implemented soon? Could we start with a simple solution?

lisphilar commented 3 years ago

I intended to analyse the relationship with PolicyMeasures class, but solving difficult issue #225 is necessary. One solution with Scenario class is here. Predict parameter values and simulate the numer of cases with these predicted parameter values.

What do you think about these steps? Sentences in bold will be the most difficult part.

Inglezos commented 3 years ago

I intended to analyse the relationship with PolicyMeasures class, but solving difficult issue #225 is necessary.

Why is necessary to have a web service/RESTful API for such relationship? Can't we run these in advance once for a specific set of countries and then find this relationship on-the-fly?

Regarding the above algorithm I think this would suffice as a standalone solution and would enable the model to consider the various government measures in effect. I think this is vital to be implemented soon. And if not a complete solution, for a starting point it would be enough to apply this to a single future phase (one predicted set of parameters) or to the next month, for short-term impact.

Another question more general, what is the physical meaning of the estimated parameters? Do they make sense, are the parameters logical? For example, for Greece the Rt now is 19.5 . What does this mean? Is it logical that one individual can infect other 19 people? Or is it just a value with no realistic meaning, that serves only for fitting of the data to the model?

lisphilar commented 3 years ago

Which do you want to use for this analysis, PolicyMeasures class or Scenario class? Does "Standalone solution" mean that we will create a new class?

If PolicyMeasures class, though we can use small number of countries with .countries setter (i.e. property users can change), but it would be helpful if we can run many countries. I did not tried, but machine learning needs a lot of data to predict the results, avoiding over-fitting. (However, we can try it. If you think yes, please move forward to discussion about detailed codes or algorithms.)

If Scneario class, we can implement the function with the steps I mentioned in the previous comment. Please discuss the codes to implement.

If another class, please explain the detailed steps of your idea.

Another question more general, what is the physical meaning of the estimated parameters?

Reproduction number is a index to know whether outbreaking (Rt > 1) or not. Parameter values have physical/logical meanings and have units [1/min]. E.g. rho is effective contact rate. Please refer to my model desription in my Kaggle notebook. https://www.kaggle.com/lisphilar/covid-19-data-with-sir-model#SIR-to-SIR-F

Rho, sigma and kappa are functions of control factors as explained in Factors of model parameters section of my Kaggle Notebook. https://www.kaggle.com/lisphilar/covid-19-data-with-sir-model#Factors-of-model-parameters

Inglezos commented 3 years ago

I think for a first solution implementation a Scenario class/method would suffice. A simple pattern recognition or even trend analysis in the {Rt or parameters set} - response_index plane could probably be enough, in order to predict short-term future model parameters after some measures were applied, per some representative and specific countries analysis.

lisphilar commented 3 years ago

[MEMO] pre-test: https://gist.github.com/lisphilar/637d248376eb9fb7511ba9c037aae9b2 Updated idea

  1. User-specification ot time-series prediction of OxCGRT scores in the future phases
  2. Linear regression: X = OxCGRT scores, y = rho values etc. (theta, kappa, sigma, rho)
  3. Evaluation of linear regression (RMSLE etc.)
  4. Predict rho values in the future phases with linear regression above
  5. Set future phases using the predicted parameter values
  6. Simulate the number of cases
Inglezos commented 3 years ago

May I suggest another way to predict future values? What if for a moment we forget the OxCGRT index and focus solely on Rt (and the other parameters). Essentially we need to find a function that fits the estimated values for Rt (and the rest parameters). If we find such a fitting function then we can extrapolate the next values. We could use the index only in case we want to estimate the delay period (if this is needed). What do you think?

lisphilar commented 3 years ago

Yes, time series forcasting only with parameter values is an alternative. However, I tried a prototype of this solution in the bottom of the notebook I mentioned in the previous comment and failed in forcasting as shown in line 97. How can we improve it?

Inglezos commented 3 years ago

You mean a prototype of which solution, the alternative I described or the one you had in mind with the index? As a first attempt I think it would be easier to try the alternative one. If you tried other values for delay? Or try other parameters? I think the major problem is that you used a linear regressor. I wouldn't expect the values to follow such a distribution.

Inglezos commented 3 years ago

Perhaps a time varying autoregressive model would be more appropriate for fitting https://arxiv.org/pdf/1711.05204.pdf https://icasas.github.io/tvReg/reference/tvAR.html (I haven't searched into this yet)

lisphilar commented 3 years ago

Linear regression part was for the idea with OxCGRT scores. This is not related to the alternative you nentioned. The bottom lines with Dart package is for the alternative (time series fodcasting of parameter values).

lisphilar commented 3 years ago

MEMO: https://gist.github.com/lisphilar/8f492770cd4c306b081873ca71b7871d It be required to predict OxCGRT scores using time series forcasting, but this is the next step.

lisphilar commented 3 years ago

I tired time series forcasting without OxCGRT scores, but it seems difficult to forecast parameter values with this solution because parameter values show wild ups and downs. https://gist.github.com/lisphilar/30cb8d615659948334fb3aa5faa20aca

Inglezos commented 3 years ago

MEMO: https://gist.github.com/lisphilar/8f492770cd4c306b081873ca71b7871d It be required to predict OxCGRT scores using time series forcasting, but this is the next step.

This is very good I think!

I tired time series forcasting without OxCGRT socres, but it seems difficult to forecast parameter values with this solution because parameter values show wild ups and downs. https://gist.github.com/lisphilar/30cb8d615659948334fb3aa5faa20aca

It is a nice first approach I think. Also try to use AutoARIMA as well in the first model selection (they have same score with exponential smoothing).

A general note, I think the RMSLE by itself is not that much credible, because the parameter values are very small. These ups and downs maybe cannot be forecasted with good accuracy. They probably depend on the index. Also, I don't think that there is point in predicting long-term. We should aim to predict the parameters for the next phase only, short-term, i.e. for 2-6 weeks max into the future.

Inglezos commented 3 years ago

How OxCGRT index is combined and used in forecasting?

lisphilar commented 3 years ago

Like this: https://gist.github.com/lisphilar/21d251e40822186a9c6490dac82ce988

Inglezos commented 3 years ago

This seems very promising indeed!!

lisphilar commented 3 years ago

I try to use recovery period (=17 days) rather than 14 days as delay. Do you have any ideas?

Inglezos commented 3 years ago

In order to calculate the delay? Hmm.. if you compared the dates per country when the index was changed rapidly or critical measures were imposed to the dates of the phases (from S-R trend amalysis) or the dates when parameters changed rapidly?

Like applying change point analysis but in parameters-index plane instead of S-R.

Averaging of these change points duration then could indicate such delay period.

lisphilar commented 3 years ago

It seems a difficult issue and this will be solved in the future versions...

I created pull request #471 as the first step. I will check the outputs for some countries tomorrow (UTC).

Usage:

snl = cs.Scenario(jhu_data, population_data, "Japan")
snl.trend()
snl.estimate(cs.SIRF)
snl.predict(oxcgrt_data)
snl.summary()
snl.simulate()
snl.history("Rt")

I may rename .predict() to .fit_predict() and create .fit() and .predict().

lisphilar commented 3 years ago

471 was merged and tutorial of .fit_predict() etc. was documented.

https://lisphilar.github.io/covid19-sir/usage_quick.html#Short-term-prediction-of-parameter-values

rebeccadavidsson commented 3 years ago

In this example notebook, this code was used to include the delay period of 14 days:

# Assume OxCGRT score impact on parameter values with 14 days delay
delay = 14
df = oxcgrt_df.set_index("Date")
df.index += timedelta(days=delay)
merged_df = param_df.join(df, how="right")
merged_df.tail()

However, this delay is different for each country and the 'end' of the effects from Policy Measures is also very different. I made a short overview at the bottom of this notebook to identify the 'ending' effect of policy measures: https://github.com/rebeccadavidsson/SIR_LSTM/blob/main/corr_oxf.ipynb

Just wanted to share this for any new ideas of implementations.

Inglezos commented 3 years ago

Yes as far as I know the delay should not be a fixed value, but calculated dynamically for each country. In this first implementation the delay is set to the recovery period just to have a first working functionality. This will have to be revised. I refer you to my previous comment:

In order to calculate the delay? Hmm.. if you compared the dates per country when the index was changed rapidly or critical measures were imposed to the dates of the phases (from S-R trend amalysis) or the dates when parameters changed rapidly? Like applying change point analysis but in parameters-index plane instead of S-R. Averaging of these change points duration then could indicate such delay period.

Inglezos commented 3 years ago

The delay period will be reworked with issue #513.

lisphilar commented 3 years ago

Very very interesting. We will move forward to the new issue.