lisphilar / covid19-sir

CovsirPhy: Python library for COVID-19 analysis with phase-dependent SIR-derived ODE models.
https://lisphilar.github.io/covid19-sir/
Apache License 2.0
110 stars 44 forks source link

[Question/Docs] how to access S-R trend analysis statistical details #894

Closed lisphilar closed 2 years ago

lisphilar commented 3 years ago

Summary of question

@AnujTiwari wrote at issue #851.

Which trend analysis technique S-R trend is using? Is it possible to access the trend type, p-value, z-value, slope, and other parameters related to the trend results? Also, is it possible to collect the resultant parameters for homogeneity analysis?

lisphilar commented 3 years ago

Dear @AnujTiwari, Could you me the details of trend analysis technique and homogeneity analysis?

lisphilar commented 3 years ago

We can get MSE, MEPE, RMSLE score etc. with TrendDetector class.

import covsirphy as cs
data_loader = cs.DataLoader()
jhu_data = data_loader.jhu()
country = "Italy"
record_df, _ = jhu_data.records(country=country)

detector = cs.TrendDetector(record_df, min_size=7)
_ = detector.sr(algo="Binseg-normal")
df = detector.summary(metrics="MSE")
print(df)

[Update] simplified the script.

Please also refer to #670.

AnujTiwari commented 3 years ago

A change-point is usually related to an abrupt or structural change in the distributional properties of data, whereas trend detection is an analysis that looks for the existence of gradual departure of data from its past. Change-point and trend detection are both long-lived research questions that have been frequently raised in statistical and non-statistical communities for decades. We have parametric (assumption: data follows a normal distribution) and non-parametric (assumption: data does not follow a normal distribution) trend assessment techniques. In the case of COVID-19, there are a lot of articles that are in the favor of one or another considering their pros and cons.

We can discuss more about them but my current understanding with CovSirPhy is that you have implemented a change point detection algorithm (ruptures package) to find out the abrupt changes and obviously dataset between the two change points represents a trend. But there is no any trend assessment technique is implemented for accessing the nature of the test. So we are using these small-small time series for computing the reproduction number and other SIR-specific results.

lisphilar commented 3 years ago

Yes, change point detection is a challenging task and it is the time to update our trend analysis to academic level. Because the accuracy of trend analysis affect the outcome of the subsequent analysis, improvements and detailed assessments are necessary.

However, please confirm the background of implementation of our trend analysis. We consider the series of the following steps as one workflow.

snl.trend()
snl.estimate(cs.SIRF)
snl.score()

S-R trend analysis was just created to improve the accuracy of parameter estimation by splitting the time series data to phases. The purpose of S-R trend analysis is to find change points which ensure that the records between change points follow a SIR-derived model with stable parameter values.

(I think we can know "trend of outbreak" with the history of model parameter values. S-R trend analysis is not a tool to know the "trend of outbreak" and this is confusing for experts?)

Accuracy of trend analysis + parameter estimation is assessed with Scenario.score(), and I thought that of trend analysis is assessed with TrendDetector.summary() with MSE etc. as I mentioned in the previous comment.

I'm not familier with the scientific background of change point analysis (I just studied it with documentation of ruptures and quick reading of its papers). I have two questions.

  1. It is possible to get slope by updating TrendDetector, but what information we will get with slope?
  2. How can we calculate p-value and z-score with S-R trend analysis?
AnujTiwari commented 3 years ago

Yes - It is possible to compute the slope and all the other statistical parameters too (p-value and z-score) using some non-parametric trend analysis techniques like Mann Kendall and Sen Slope Trend Analysis. I can try if it is possible to access the S-R time series?

lisphilar commented 3 years ago

S-R time series data has Date/log10(Susceptible)/Recovered and we can create it (sr_df) with JHUData as follows.

import covsirphy as cs
import numpy as np
import pandas as pd
loader = cs.DataLoader()
jhu_data = loader.jhu()
subset_df, _ = jhu_data.records(country="country name", province="province name")
sr_dict = {
    "Date": subset_df["Date"],
    "R": subset_df["Recovered"],
    "log10S": np.log10(subset_df["Susceptible"].astype(np.float64)),
}
sr_df = pd.DataFrame(sr_dict).set_index("Date")
AnujTiwari commented 3 years ago

Thanks, Lisphilar for the code. I will definitely provide you update on the trend analysis soon.

lisphilar commented 3 years ago

This may be off topic, but time series clustering (find patterns) of S-R trend can also be a new object of study. (This is also related to #396.)

Time Series Clustering — Deriving Trends and Archetypes from Sequential Data https://towardsdatascience.com/time-series-clustering-deriving-trends-and-archetypes-from-sequential-data-bb87783312b4

lisphilar commented 2 years ago

I'm preparing for version 3 release after some 2.x versions and revising class structures and methods of analysis. We are using ruptures package to find structural changes of ODE parameter value sets indirectly by decting chainge points of logS and R. We called it as "S-R trend analysis," but "S-R change point analysis" should be used?

lisphilar commented 2 years ago

"S-R change point analysis" will be used from the next stable version 2.25.0.