Open Blackandwhite23 opened 2 months ago
Here some data in the attachment if needed ADF_Test.csv
import pandas as pd filename = 'ADF_Test.csv' data = pd.read_csv(filename,sep=',',decimal='.', parse_dates=["time"], index_col="time") display(data)
import ydata_profiling
profile = ydata_profiling.ProfileReport(data, title="ADF Test", explorative=True, tsmode=True) profile.to_notebook_iframe() profile.to_file("ADF_Test.html")
description = profile.get_description()
for col in data: var1 = description.variables.get(col) stat = var1.get('stationary') p = var1.get('addfuller') display("Column: " + col + " ; Stationary: " + str(stat) +" ; P: " + str(p))
And then I get: Column: Col1 ; Stationary: False ; P: 8.367848162944989e-15
Hi @Blackandwhite23!
I looked at the file: describe_timeseries_pandas.py
function pandas_describe_timeseries_1d
, which returns the stationary
and p_value
. The function has a check for seasonal
(if it is, return False(row 214)):
stats["stationary"] = is_stationary and not stats["seasonal"]
My knowledge of statistics is modest. From everything I've seen and read, I know that remove the trend. Here it is written that stationary ones do not have a trend and seasonality. And there is also a discussion here.
Current Behaviour
I made a report of a time series and then used the following code: description = profile.get_description() for col in df: var1 = description.variables.get(col) stat = var1.get('stationary') p = var1.get('addfuller') display("Column: " + col + " ; Stationary: " + str(stat) +" ; P: " + str(p))
I analysed a data set with some columns and get the following result: Column: Column 1 ; Stationary: False ; P: 8.367848162951153e-15 Column: Column 2 ; Stationary: False ; P: 1.0170622187220445e-11 Column: Column 3 ; Stationary: False ; P: 2.555609761088582e-05 Column: Column 4 ; Stationary: False ; P: 7.172269761903138e-08 Column: Column 5 ; Stationary: False ; P: 9.321131415426812e-18 Column: Column 6 ; Stationary: False ; P: 9.027089348108759e-15 Column: Column 7 ; Stationary: False ; P: 0.02133819126759494 Column: Column 8 ; Stationary: False ; P: 4.406120572138344e-12 Column: Column 9 ; Stationary: False ; P: 0.0028888647417244155 Column: Column 10 ; Stationary: False ; P: 0.00044090523969600784 Column: Column 11 ; Stationary: False ; P: 0.00286260675205775 Column: Column 12 ; Stationary: False ; P: 0.0001708455587419074 Column: Column 13 ; Stationary: False ; P: 9.472249697294651e-30 Column: Column 14 ; Stationary: False ; P: 2.526552913384979e-12 Column: Column 15 ; Stationary: False ; P: 0.000455609981090904 Column: Column 16 ; Stationary: False ; P: 0.0004254554235795494 Column: Column 17 ; Stationary: None ; P: None Column: Column 18 ; Stationary: None ; P: None Column: Column 19 ; Stationary: False ; P: 1.2239118466383953e-16 Column: Column 20 ; Stationary: True ; P: 9.06748511005521e-29 Column: Column 21 ; Stationary: True ; P: 0.005396832629069178 Column: Column 22 ; Stationary: True ; P: 1.850847639853015e-11
Expected Behaviour
I would expect, that it marks every column with a p-value of < 0.05 as "stationary".
Data Description
I used a private dataset
Code that reproduces the bug
pandas-profiling version
v4.9.0
Dependencies
OS
Linux 6.1.85+
Checklist