timeseriesAI / tsai

Time series Timeseries Deep Learning Machine Learning Python Pytorch fastai | State-of-the-art Deep Learning library for Time Series and Sequences in Pytorch / fastai
https://timeseriesai.github.io/tsai/
Apache License 2.0
4.92k stars 622 forks source link

Please provide some suggestions for multi timeseries training #815

Closed eromoe closed 10 months ago

eromoe commented 11 months ago

Hello,

I am facing some challenges in predicting time-series classification for stock data. Some are resolvable, though I'm not sure is there any better solution.

  1. Many diffierent series : around 4,000 stocks
    • There is a need to support some type of group training, such as company type or industry. ( use Dataset api)
    • Further training needs to be performed according to time segmentation. ( use get_walk_forward_splits , but catergory value varying by time, some value disappear , and new value come. )
  2. Unequal Time Lengths:
    • Almost no two stocks share the same time lengths; the starting dates for each stock's data are different. ( if split by time, each range only contain different stocks )
  3. Handling Missing Values:
    • Some data points are missing due to a suspension in trading (no trading took place on these days for these particular stocks, although other stocks may have been active).
    • There are also genuine instances of data missing, like some fields in the financial reports. It is not feasible to simply fill in with zeros or the mean value. A dynamic missing value filling method that adjusts over time might be necessary, which I currently don't have a good solution for.

After thinking of these problem, I'm confused about how to get started..

madsh0402 commented 11 months ago

There are also genuine instances of data missing, like some fields in the financial reports. It is not feasible to simply fill in with zeros or the mean value. A dynamic missing value filling method that adjusts over time might be necessary, which I currently don't have a good solution for.

This issue might be solvable through interpolation. There are various types of interpolations (such as linear and polynomial) that can handle different scenarios for filling out missing values.

I used linear interpolation in a paper I wrote about forecasting stock movements with ARIMA models, as the method I used could not handle missing values. Here is a great article on interpolations and their various types. Also, here is the documentation for the function I used.

I hope this is helpful.

-Mads

eromoe commented 11 months ago

For nan value case, currently I use fbprophet( forward predict and fill value), but this case is a bit complicated and only learn pattern form single stock. And model need learn the missing pattern across multiple stocks, and make the whole pipeline much more heavy. There are too many problem of interpolation

The ideal way I thought was fillna on model training and determine timeseries pattern from every certain window, didn't see similar research. ( Maybe my though is wrong, I am not reseacher with many paper reading )

Maybe it is way to create a embedding for each category ( to mock cluasting seasonality or trending accross stocks) train with many for loop with hirerachy structure, too heavy again. I am trying to find a way have the state of art balance between performance and efficiency .

oguiza commented 10 months ago

I'll move this issue to discussion since there doesn't seem to be any specific issue related to the tsai library but more a general discussion on how to approach a certain type of tasks.