Closed lisphilar closed 2 years ago
Dear @AnujTiwari, Could you me the details of trend analysis technique and homogeneity analysis?
We can get MSE, MEPE, RMSLE score etc. with TrendDetector
class.
import covsirphy as cs
data_loader = cs.DataLoader()
jhu_data = data_loader.jhu()
country = "Italy"
record_df, _ = jhu_data.records(country=country)
detector = cs.TrendDetector(record_df, min_size=7)
_ = detector.sr(algo="Binseg-normal")
df = detector.summary(metrics="MSE")
print(df)
[Update] simplified the script.
Please also refer to #670.
A change-point is usually related to an abrupt or structural change in the distributional properties of data, whereas trend detection is an analysis that looks for the existence of gradual departure of data from its past. Change-point and trend detection are both long-lived research questions that have been frequently raised in statistical and non-statistical communities for decades. We have parametric (assumption: data follows a normal distribution) and non-parametric (assumption: data does not follow a normal distribution) trend assessment techniques. In the case of COVID-19, there are a lot of articles that are in the favor of one or another considering their pros and cons.
We can discuss more about them but my current understanding with CovSirPhy is that you have implemented a change point detection algorithm (ruptures package) to find out the abrupt changes and obviously dataset between the two change points represents a trend. But there is no any trend assessment technique is implemented for accessing the nature of the test. So we are using these small-small time series for computing the reproduction number and other SIR-specific results.
Yes, change point detection is a challenging task and it is the time to update our trend analysis to academic level. Because the accuracy of trend analysis affect the outcome of the subsequent analysis, improvements and detailed assessments are necessary.
However, please confirm the background of implementation of our trend analysis. We consider the series of the following steps as one workflow.
snl.trend()
snl.estimate(cs.SIRF)
snl.score()
S-R trend analysis was just created to improve the accuracy of parameter estimation by splitting the time series data to phases. The purpose of S-R trend analysis is to find change points which ensure that the records between change points follow a SIR-derived model with stable parameter values.
(I think we can know "trend of outbreak" with the history of model parameter values. S-R trend analysis is not a tool to know the "trend of outbreak" and this is confusing for experts?)
Accuracy of trend analysis + parameter estimation is assessed with Scenario.score()
, and I thought that of trend analysis is assessed with TrendDetector.summary()
with MSE etc. as I mentioned in the previous comment.
I'm not familier with the scientific background of change point analysis (I just studied it with documentation of ruptures
and quick reading of its papers). I have two questions.
TrendDetector
, but what information we will get with slope?Yes - It is possible to compute the slope and all the other statistical parameters too (p-value and z-score) using some non-parametric trend analysis techniques like Mann Kendall and Sen Slope Trend Analysis. I can try if it is possible to access the S-R time series?
S-R time series data has Date/log10(Susceptible)/Recovered and we can create it (sr_df
) with JHUData
as follows.
import covsirphy as cs
import numpy as np
import pandas as pd
loader = cs.DataLoader()
jhu_data = loader.jhu()
subset_df, _ = jhu_data.records(country="country name", province="province name")
sr_dict = {
"Date": subset_df["Date"],
"R": subset_df["Recovered"],
"log10S": np.log10(subset_df["Susceptible"].astype(np.float64)),
}
sr_df = pd.DataFrame(sr_dict).set_index("Date")
Thanks, Lisphilar for the code. I will definitely provide you update on the trend analysis soon.
This may be off topic, but time series clustering (find patterns) of S-R trend can also be a new object of study. (This is also related to #396.)
Time Series Clustering — Deriving Trends and Archetypes from Sequential Data https://towardsdatascience.com/time-series-clustering-deriving-trends-and-archetypes-from-sequential-data-bb87783312b4
I'm preparing for version 3 release after some 2.x versions and revising class structures and methods of analysis. We are using ruptures package to find structural changes of ODE parameter value sets indirectly by decting chainge points of logS and R. We called it as "S-R trend analysis," but "S-R change point analysis" should be used?
"S-R change point analysis" will be used from the next stable version 2.25.0.
Summary of question
@AnujTiwari wrote at issue #851.