huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
128.5k stars 25.49k forks source link

Error During Training with PatchTSMixerForTimeSeriesClassification for Time Series Classification #30614

Open tdg2088 opened 2 months ago

tdg2088 commented 2 months ago

System Info

Who can help?

@kashif

Information

Tasks

Reproduction

I am attempting to use PatchTSMixerForTimeSeriesClassification for time series classification. My input data for each day includes 24 feature vectors, plus one time vector and one label vector. The total dataset comprises 1430 trading days, divided into three subsets with a 6:2:2 split for training, validation, and testing respectively.

I am training the model over periods of 30 trading days. During training, I encounter the following error: mat1 and mat2 shapes cannot be multiplied (1x18000 and 17280x3). I would like to determine whether this issue is related to the PatchTSMixerForTimeSeriesClassification itself, an error in dataset construction, or a misconfiguration in the model setup. If it’s related to how the dataset is constructed or how the model is configured, how should I adjust it?

Here is the code snippet:

# -*- coding: UTF-8 -*-
# 导入所需的库
import logging
from transformers import (
    EarlyStoppingCallback, 
    PatchTSMixerConfig,
    Trainer,
    TrainingArguments,
)
import numpy as np
import pandas as pd
import cx_Oracle
from sqlalchemy import create_engine
from sklearn.preprocessing import StandardScaler, LabelEncoder
from transformers import PatchTSMixerForTimeSeriesClassification
from tsfm_public.toolkit.dataset import ForecastDFDataset
from tsfm_public.toolkit.time_series_preprocessor import TimeSeriesPreprocessor

context_length = 30
label = 'label_class'
prediction_length=1
timestamp_column = 'trade_date'
num_classes = 3
num_workers = 8  
batch_size = 1
num_input_channels = 24
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

def connect_to_database():
    skip
# Function to calculate the maximum price in the next three days

def calculate_future_high(row, df, look_forward_days=3):
  skip

# Function to fetch single stock data
def get_single_stock_data(ts_code):
 skip

# Function to build dataset
def load_data(ts_code):
    df = get_single_stock_data(ts_code)

    # 标签编码
    le = LabelEncoder()
    df[label] = le.fit_transform(df[label])
    # 按时间顺序分割数据集,保留时间序列的连续性
    total_samples = len(df)
    train_end = int(total_samples * 0.6)  # 训练集占60%
    valid_end = int(total_samples * 0.8)  # 验证集占20%,测试集占20%

    train_df = df[:train_end]
    valid_df = df[train_end:valid_end]
    test_df = df[valid_end:]

    input_columns = [col for col in df.columns if col != label and col != timestamp_column]

    # 创建时间序列预处理器
    time_series_processor = TimeSeriesPreprocessor(
        context_length=context_length,
        id_columns=[], 
        timestamp_column=timestamp_column,
        input_columns=input_columns,  # 排除标签和时间戳列
        target_columns=[label],  # 输出列仅包括目标列
        prediction_length=prediction_length,
        scaling=False, 
        time_series_task='classification',  # 设置任务类型为分类
    )

    # 训练预处理器
    time_series_processor = time_series_processor.train(train_df)
    # 预处理数据
    train_processed = time_series_processor.preprocess(train_df)
    valid_processed = time_series_processor.preprocess(valid_df)
    test_processed = time_series_processor.preprocess(test_df)

    # 创建 ForecastDFDataset 对象
    train_dataset = ForecastDFDataset(
        data=train_processed,
        conditional_columns=input_columns,
        timestamp_column=timestamp_column,
        context_length=context_length,
        target_columns=[label], 
        prediction_length=prediction_length
    )
    valid_dataset = ForecastDFDataset(
        data=valid_processed,
        conditional_columns=input_columns,
        timestamp_column=timestamp_column,
        context_length=context_length,
        target_columns=[label], 
        prediction_length=prediction_length
    )
    test_dataset = ForecastDFDataset(
        data=test_processed,
        conditional_columns=input_columns,
        timestamp_column=timestamp_column,
        context_length=context_length,
        target_columns=[label], 
        prediction_length=prediction_length
    )

    return train_dataset, valid_dataset, test_dataset

if __name__ == "__main__":
    try:
        # 模型配置
        config = PatchTSMixerConfig(
            context_length=context_length,
            prediction_length=prediction_length,
            num_classes=num_classes,
            patch_length=1,
            num_input_channels=num_input_channels,
            patch_stride=1,
            d_model=720,
            num_layers=3,
            expansion_factor=3,
            dropout=0.1,
            head_dropout=0.7,
            mode="common_channel",
            scaling="std",
        )

        # 创建模型实例
        model = PatchTSMixerForTimeSeriesClassification(config)

        # 训练参数配置
        train_args = TrainingArguments(
            output_dir="./model_output",
            overwrite_output_dir=True,
            learning_rate=0.0001,
            num_train_epochs=100,
            do_eval=True,
            evaluation_strategy="epoch",
            per_device_train_batch_size=1,
            per_device_eval_batch_size=1,
            dataloader_num_workers=num_workers,
            report_to="tensorboard",
            save_strategy="epoch",
            logging_strategy="epoch",
            save_total_limit=3,
            logging_dir="./logs",
            load_best_model_at_end=True,
            metric_for_best_model="f1",
            greater_is_better=False,
            label_names=[label],
        )

        # 加载数据
        train_dataset, valid_dataset, test_dataset = load_data('002699.SZ')
        # 设置早停策略
        early_stopping_callback = EarlyStoppingCallback(
            early_stopping_patience=5,
            early_stopping_threshold=0.001,
        )

        # 创建训练器
        trainer = Trainer(
            model=model,
            args=train_args,
            train_dataset=train_dataset,
            eval_dataset=valid_dataset,
            callbacks=[early_stopping_callback],
        )

        logging.info("Starting model training.")
        trainer.train()
        logging.info("Model training completed.")

        logging.info("Evaluating model on test dataset.")
        results = trainer.evaluate(test_dataset)
        logging.info(f"Evaluation results: {results}")

    except Exception as e:
        print(f"An error occurred during training: {str(e)}")

the console print: 2024-05-02 16:07:15,385 - INFO - Starting model training. 0%| | 0/82800 [00:00<?, ?it/s]An error occurred during training: mat1 and mat2 shapes cannot be multiplied (1x18000 and 17280x3)

When I modified the PatchTSMixerConfig as shown below, I encountered a new error.

    ```

模型配置

    config = PatchTSMixerConfig(
        context_length=context_length,
        prediction_length=prediction_length,
        num_classes=num_classes,
        patch_length=15,
        num_input_channels=num_input_channels+1,
        patch_stride=15,
        d_model=30,
        num_layers=3,
        expansion_factor=3,
        dropout=0.1,
        head_dropout=0.7,
        mode="common_channel",
        scaling="std",
    )

   the console print:
2024-05-02 19:28:39,174 - INFO - Starting model training.

  0%|          | 0/82800 [00:00<?, ?it/s]An error occurred during training: The model did not return a loss from the inputs, only the following keys: prediction_outputs,last_hidden_state. For reference, the inputs it received are past_values.
[002699.SZ.csv](https://github.com/huggingface/transformers/files/15187706/002699.SZ.csv)

### Expected behavior

I would like to determine whether this issue is related to the PatchTSMixerForTimeSeriesClassification itself, an error in dataset construction, or a misconfiguration in the model setup. If it’s related to how the dataset is constructed or how the model is configured, how should I adjust it?
NielsRogge commented 1 month ago

cc @kashif

github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

tdg2088 commented 1 month ago

Thank you for your response and reminder.

To further clarify and provide more details, here is an update on the current situation:

  1. Original Issue: I encountered a matrix dimension mismatch error when using PatchTSMixerForTimeSeriesClassification for time series classification: mat1 and mat2 shapes cannot be multiplied (1x18000 and 17280x3). This occurred with the following configuration:

    config = PatchTSMixerConfig(
        context_length=context_length,
        prediction_length=prediction_length,
        num_classes=num_classes,
        patch_length=1,
        num_input_channels=num_input_channels,
        patch_stride=1,
        d_model=720,
        num_layers=3,
        expansion_factor=3,
        dropout=0.1,
        head_dropout=0.7,
        mode="common_channel",
        scaling="std",
    )
  2. Attempted Solution: I tried modifying the PatchTSMixerConfig as follows:

    config = PatchTSMixerConfig(
        context_length=context_length,
        prediction_length=prediction_length,
        num_classes=num_classes,
        patch_length=15,
        num_input_channels=num_input_channels + 1,
        patch_stride=15,
        d_model=30,
        num_layers=3,
        expansion_factor=3,
        dropout=0.1,
        head_dropout=0.7,
        mode="common_channel",
        scaling="std",
    )

    This resulted in a new error: The model did not return a loss from the inputs, only the following keys: prediction_outputs,last_hidden_state. For reference, the inputs it received are past_values.

  3. Dataset Description: My dataset consists of 1430 trading days, each with 24 feature vectors, one time vector, and one label vector. The dataset is split into training (60%), validation (20%), and test (20%) sets.

  4. Code Snippet: I have provided the complete code snippet and data processing logic in my initial issue description.

  5. Nature of the Problem: I am unsure whether this issue is due to the PatchTSMixerForTimeSeriesClassification model itself, an error in dataset construction, or a misconfiguration in the model setup.

I would greatly appreciate your help in identifying the root cause of this problem and finding a solution. If you need any additional information or have any suggestions, please let me know. I would be very grateful for any further guidance or assistance.

Thank you!

Scott534 commented 3 weeks ago

in the code:

    train_dataset = ForecastDFDataset(
        data=train_processed,
        conditional_columns=input_columns,
        timestamp_column=timestamp_column,
        context_length=context_length,
        target_columns=[label], 
        prediction_length=prediction_length
    )

you should change "ForecastDFDataset" to something else. Since they did not make a classification dataset, you can check here: https://github.com/ibm-granite/granite-tsfm/issues/17

NielsRogge commented 3 weeks ago

Pinging @kashif here