Closed knwyne20 closed 1 year ago
Hi! Was looking for update on this.
Hi @knwyne20,
at the moment there is no way to inform what are the time variant variables to the profiling. The automation decides based on the autocorrelation level of the variables. If it below a certain threshold is considering them numerical.
Unfortunately this logic is causing some variables to be misidentified in your case.
I've update this issue as a feature request.
knwyne20 on version 4.1.0 (PR https://github.com/ydataai/ydata-profiling/pull/1274) a new feature was introduced that allows you to manually define the data types bypassing the type inference:
def create_dataframe(size=1000, alt=False):
time_steps = np.arange(size)
return pd.DataFrame(
{
"ascending_sequence": time_steps,
"sin": map(lambda x: round(np.sin(x * np.pi / 180), 2), time_steps),
"cos": map(lambda x: round(np.cos(x * np.pi / 180), 2), time_steps),
"cat": np.random.choice([0,1,2], size=size, replace=True),
}
)
df = create_dataframe()
prof = ProfileReport(
df,
tsmode=True,
type_schema={
"ascending_sequence": "categorical",
"sin": "timeseries",
"cos": "numeric",
"cat": "numeric",
})
prof.to_file("profile.html")
Hi @knwyne20,
As the feature that you requested is already available (see https://github.com/ydataai/ydata-profiling/issues/1292#issuecomment-1544552358), I am closing this issue.
Feel free to re-open in case the solution is not satisfying.
Current Behaviour
I am running a dataset through pandas profiling using tsmode=true however some of the time dependent variables are coming off as real numbers in the pandas profile. Under a "VARIABLE" tab in pandas profile, i see histograms of these variables vs line graphs. Also, since some of these variables are automatically identified as real numbers, i don't get their autocorrelation graphs too containing ACF and PACF information. According to this:
"To enable a time-series report to be generated ts_mode needs to be set to “True”. If “True” the variables that have temporal dependence will be automatically identified based on the presence of autocorrelation."
Is there anyway i can tell pandas profile what variables are time dependent rather having it automatically identify those?
Expected Behaviour
I would need pandas profile to correctly identify all the time dependent variables and not as real numbers.
Data Description
My dataset is not publically available.
Code that reproduces the bug
No response
pandas-profiling version
ydata profiling 4.1.0
Dependencies
OS
windows 10
Checklist