Is AutoTimes truly leveraging LLM's language capabilities?

SkyKingL commented 4 days ago

I'm very interested in using LLMs for time series forecasting. However, I noticed that in AutoTimes, the temporal information is only incorporated through timestamp embeddings in the language semantic space. There seems to be no actual interaction between time series patterns and language understanding/reasoning capabilities of the LLM.

Isn't this essentially just using LLM as a powerful transformer for feature extraction, rather than truly leveraging its language understanding abilities? I'm curious about your thoughts on this distinction, as I believe the potential of combining LLMs with time series analysis could be much deeper.

WenWeiTHU commented 4 days ago

Hello, thanks for your interest and insightful questions.

I cannot agree more that the key step in using LLM for time series is the "interaction between time series patterns and language understanding/reasoning capabilities", but I believe there are several great difficulties in current stage:

Lack of General Feature Backbone for Time Series: Till now, no works like CLIP and Pre-trained ViT are available in the time series community (the alignment/ language labeling of time series is very underexploited). They play an essential role in VLM (such as LLaVA) and enable the "actual interactive" between two modalities. To cope with it, we have to convert off-the-shelf LLM as the feature backbone of time series based on the general-purpose token transition. In fact, LLM works simultaneously for natural language and time series in AutoTimes(Figure 3).

WenWeiTHU commented 4 days ago

How to Enable the Knowledge Transfer in the Two Modalities (Models): Let's imagine that we already have two large models for time series and natural language. We can easily achieve the transfer by introducing a projector to align their features. However, a more important question is the formulation we align them. Concretely, the sequence-wise approach v.s. token-wise approach. Based on the same sequential formulation, we use the latter instead of the former which is prevalent in VLM. Perhaps now, this approach is inconclusive, but I think it is necessary to consider the differences between the pair of language+X modalities.

WenWeiTHU commented 4 days ago

![Uploading image.png…]()

WenWeiTHU commented 4 days ago

Since the LLM is frozen in our model, we have the desire that LLM can still understand the textual information (e.g., timestamps). Meanwhile, the parameters used for time series only include the embedding and projection layer. As you mentioned, making time series and natural language both housed in the same model is an constrained choice, which omits the deeper combining of LLM and TSModel (e.g., multi-stage fine-tuning in VLM). We also look forward to see further explorations in this direction

SkyKingL commented 3 days ago

Thank you for your comprehensive explanation about the current challenges in combining LLMs with time series analysis. Your insights about the lack of general feature backbone for time series and the complexities in enabling knowledge transfer between modalities have greatly deepened my understanding of the design choices in AutoTimes.

I noticed your comment on the "Are Language Models Actually Useful for Time Series Forecasting" repository regarding your ablation studies. Your findings about the effectiveness of large language models as autoregressive forecasters are particularly interesting. I completely agree with your perspective on the inconsistency between non-autoregressive forecasting approaches and autoregressive LLMs, and how AutoTimes addresses this fundamental issue.

In fact, I see a parallel between your work and the historical development of Transformer-based models in time series forecasting. When DLinear challenged the effectiveness of Transformers in time series forecasting by showing that a simple linear model could outperform existing approaches, PatchTST responded by introducing key designs like patching and channel-independence, demonstrating that Transformers could indeed be highly effective with the right framework. Similarly, while "Are Language Models Actually Useful for Time Series Forecasting" raises important questions about previous LLM4TS approaches, your work with AutoTimes introduces a fundamentally different autoregressive framework that shows the true potential of LLMs in this domain, as the authors themselves acknowledged in their repository response.

Given the significance of your ablation studies in demonstrating the potential of LLMs in time series forecasting, I was wondering if you would be willing to share more details about these experimental results? This could be incredibly valuable for the research community's deeper exploration of LLM4TS methods.

Looking forward to your response.

WenWeiTHU commented 3 days ago

Thank you again for your acknowledgment and insightful questions :) For ablation studies, the implementation of the code is very simple (including w/o LLM: the LLM part is removed, and only the embedding and projector are trained; LLM4Attn: Replaces LLM with an MHA Layer. LLM2Trsf: the LLM is replaced by a randomly initialized Transformer), we rigorously follow the paper: "Are Language Models Actually Useful for Time Series Forecasting" and their code implementation: https://github.com/BennyTMT/LLMsForTimeSeries. If you are interested, we will provide you with the detailed experimental code (of course, this may take a little time).

WenWeiTHU commented 3 days ago

As you mentioned, there is still a lot of debate in the field of time series, such as Linear vs. Transformer, AR vs. non-AR. It has led us to rethink the question of whether the Benchmark, which is widely used in these papers, is reliable. For example, recent studies have shown that DLinear under the evaluation of comprehensive datasets, achieves unimpressive performance (https://arxiv.org/pdf/2410.10393, Table 11). To be honest, a good study will objectively prove its strengths/weaknesses rather than arguing who has done the best work at Benchmark. I think it's better to enlighten practitioners about when (the scenario) these models are appropriate and when they are not.

WenWeiTHU commented 3 days ago

With the recent revival of autoregressive models, I think it has the strength in versatility and compatibility, but it indeed has error accumulation and data-hungry training challenges. There are really a lot of good questions to explore here!

SkyKingL commented 3 days ago

Thank you so much for your detailed and thoughtful response! I deeply appreciate you taking the time to explain the ablation study implementation details and sharing those valuable insights about the experimental methodology. Your reference to following the rigorous approach from "Are Language Models Actually Useful for Time Series Forecasting" and their implementation is very helpful. I can handle the ablation studies implementation following the approach you've outlined.

Your perspective on the ongoing debates in time series forecasting is particularly enlightening. I strongly agree with your point about the importance of understanding when and where different approaches are most appropriate, rather than focusing solely on benchmark performance. The example you shared about DLinear's performance across comprehensive datasets really highlights this crucial point.

I truly look forward to potential future discussions and collaboration opportunities in this exciting field. Wishing you continued success in your research!

Best regards!

thuml / AutoTimes

Is AutoTimes truly leveraging LLM's language capabilities? #16