Why we need this change?

Poor performance of pandas, easier to use with fewer cascade calls.

How we can modify?

roll+to_numpy: This is a very classic combination that must be called almost every time. For users, they probably don't need to know what roll is, since we can probably just keep to_numpy or numpy. Btw, these changes should not affect the use of to_torch_data_loader.

# API change
from bigdl.chronos.data import TSDataset
tsdata = TSDataset.from_pandas(..., lookback=48, horizon=1, with_split=False)
x, y = tsdata.to_numpy()  # like to_torch_data_loader

Optimize some existing APIs: Perhaps too many cascade calls are not necessary, we can change some cascade calls to properties. Classified according to framework, with some usage given.

Category	pandas	tsfresh	scikit-learn	other
Method	deduplicate/impute/resample	gen_dt_feature/gen_global_feature/gen_rolling_feature	scale/unscale/unscale_numpy	to_tf_dataset/to_numpy/to_torch_data_loader/to_pandas
Advice	Change to attributes	No change	Calling `scale` will change the source data, can we leave the original data unchanged so we don't need `unscale` and `unscale_numpy` either?	Merge roll(exclude to_pandas/to_torch_data_loader)

# Change pandas-related methods to attributes.
tsdata = TSDataset.from_pandas(..., impute=True, impute_mode="const",
                               const_num=0, deduplicate=True,
                               resample=True, interval='s', start_time=None,
                               end_time=None, merge_mode='mean', with_split=False)

We can use Descriptor and Property to manage properties and methods, more info, please refer to #5656.
```
@property
def get_cycle_length(self):
cycle_length = (...)
return cycle_length
```

@get_cycle_length.setattr def get_cycle_length(self, instance, value):

Check for illegal input

if not isinstance(value, str):
    raise error
return cycle_length

Usage

tsdataset.get_cycle_length = 'min' # Set the mode of cycle_length.


4. Because of the poor performance of pandas, we can add `polars` as a new backend, `polars` has good parallel performance and supports the lazy API.
```python
tsdata = TSDataset.from_pandas(df, ..., use_polars=True)

pandas and polars performance comparison: https://h2oai.github.io/db-benchmark/ Differences between pandas and polars:

polars does not have indexes.
groupby can only return a single data column.

intel-analytics / ipex-llm

Chronos: Some new API suggestions for `TSDataset` #6054

Why we need this change?

How we can modify?

Check for illegal input

Usage