zezhishao / BasicTS

A Standard and Fair Time Series Forecasting Benchmark and Toolkit.
Apache License 2.0
469 stars 85 forks source link

Cannot run inference.py on VMWare 2016 workstation. #96

Open faizanhakim opened 7 months ago

faizanhakim commented 7 months ago

Cannot run inference.py on VMWare 2016 workstation.

This is the output: faizab@faizab-virtual-machine:~/Desktop/AML/BasicTS-master$ python3.9 experiments/inference.py -m DLinear -d PEMS08 /home/faizab/.local/lib/python3.9/site-packages/pandas/compat/init.py:124: UserWarning: Could not import the lzma module. Your installed Python is incomplete. Attempting to use lzma compression will result in a RuntimeError. warnings.warn(msg) 2023-12-04 21:41:31,751 - easytorch-launcher - INFO - Launching EasyTorch runner. DESCRIPTION: DLinear model configuration RUNNER: <class 'basicts.runners.runner_zoo.simple_tsf_runner.SimpleTimeSeriesForecastingRunner'> DATASET_CLS: <class 'basicts.data.dataset.TimeSeriesForecastingDataset'> DATASET_NAME: PEMS08 DATASET_TYPE: Traffic Flow DATASET_INPUT_LEN: 12 DATASET_OUTPUT_LEN: 12 GPU_NUM: 0 NULL_VAL: 0.0 DEVICE_NUM: 0 ENV: SEED: 1 CUDNN: ENABLED: False MODEL: NAME: DLinear ARCH: <class 'baselines.DLinear.arch.dlinear.DLinear'> PARAM: seq_len: 12 pred_len: 12 individual: False enc_in: 170 FORWARD_FEATURES: [0] TARGET_FEATURES: [0] TRAIN: LOSS: masked_mae OPTIM: TYPE: Adam PARAM: lr: 0.002 weight_decay: 0.0001 LR_SCHEDULER: TYPE: MultiStepLR PARAM: milestones: [1, 25] gamma: 0.5 NUM_EPOCHS: 100 CKPT_SAVE_DIR: checkpoints/DLinear_100 DATA: DIR: datasets/PEMS08 BATCH_SIZE: 64 PREFETCH: False SHUFFLE: True NUM_WORKERS: 2 PIN_MEMORY: False VAL: INTERVAL: 1 DATA: DIR: datasets/PEMS08 BATCH_SIZE: 64 PREFETCH: False SHUFFLE: False NUM_WORKERS: 2 PIN_MEMORY: False USE_GPU: False TEST: INTERVAL: 1 DATA: DIR: datasets/PEMS08 BATCH_SIZE: 64 PREFETCH: False SHUFFLE: False NUM_WORKERS: 2 PIN_MEMORY: False USE_GPU: False EVAL: USE_GPU: False HORIZONS: [12] DEVICE_NUM: 0

2023-12-04 21:41:31,761 - easytorch-env - INFO - Use devices 0. 2023-12-04 21:41:31,761 - easytorch-env - INFO - Disable TF32 mode 2023-12-04 21:41:31,768 - easytorch-env - INFO - Unset cudnn enabled. 2023-12-04 21:41:31,768 - easytorch - INFO - Set ckpt save dir: 'checkpoints/DLinear_100/5d8e45364518a60093378bb83e78aca4' 2023-12-04 21:41:31,769 - easytorch - INFO - Building model. Traceback (most recent call last): File "/home/faizab/Desktop/AML/BasicTS-master/experiments/inference.py", line 44, in launch_runner(cfg_path, inference, (ckpt_path, args.batch_size), devices='0') File "/home/faizab/Desktop/AML/BasicTS-master/basicts/launcher.py", line 10, in launch_runner easytorch.launch_runner(cfg=cfg, fn=fn, args=args, device_type=device_type, devices=devices) File "/home/faizab/.local/lib/python3.9/site-packages/easytorch/launcher/launcher.py", line 113, in launch_runner runner = cfg'RUNNER' File "/home/faizab/Desktop/AML/BasicTS-master/basicts/runners/runner_zoo/simple_tsf_runner.py", line 10, in init super().init(cfg) File "/home/faizab/Desktop/AML/BasicTS-master/basicts/runners/base_tsf_runner.py", line 29, in init super().init(cfg) File "/home/faizab/Desktop/AML/BasicTS-master/basicts/runners/base_runner.py", line 26, in init super().init(cfg) File "/home/faizab/.local/lib/python3.9/site-packages/easytorch/core/runner.py", line 51, in init self.model = self.build_model(cfg) File "/home/faizab/.local/lib/python3.9/site-packages/easytorch/core/runner.py", line 186, in build_model model = to_device(model) File "/home/faizab/.local/lib/python3.9/site-packages/easytorch/device.py", line 54, in to_device return src.cuda(**kwargs) File "/home/faizab/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 680, in cuda return self._apply(lambda t: t.cuda(device)) File "/home/faizab/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 570, in _apply module._apply(fn) File "/home/faizab/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 593, in _apply param_applied = fn(param) File "/home/faizab/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 680, in return self._apply(lambda t: t.cuda(device)) File "/home/faizab/.local/lib/python3.9/site-packages/torch/cuda/init.py", line 214, in _lazy_init torch._C._cuda_init() RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

Can anyone guide how to fix it, Thank You

zezhishao commented 7 months ago

It seems that there is no GPU.

 python3.9 experiments/inference.py -m DLinear -d PEMS08 -g ""
faizanhakim commented 7 months ago

Is there any method to bypass the use of gpu?

zezhishao commented 7 months ago

Does your virtual machine correctly recognize the GPU? For example, nvidia-smi has normal output?

faizanhakim commented 7 months ago

faizab@faizab-virtual-machine:~$ nvidia-smi NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

this is the output when after I run "sudo apt install nvidia-utils-450-server" first.

My laptop has Nvidia 820M available but it seem to be recognized on VMware2016

zezhishao commented 7 months ago

The nvidia-driver is not installed correctly, and you should install it first. BTW, NVIDIA 820M is too old, and maybe not enough for running the complex baselines in BasicTS.

faizanhakim commented 7 months ago

I am unable to install nvidia driver because it dosen't recognize gpu on the virtual machine. Will the model work on windows 10 OS with python3.9?

zezhishao commented 7 months ago

Honestly, considering the weaker performance of NVIDIA 820M, I wouldn't recommend using the GPU. Because it probably won't speed up training too much, and installing nvidia-driver might be an annoying task. If you want to use some simple baselines like MLP or Linear based models, you can use CPU directly. If you want to try more powerful baselines, such as STGNN or Transformer-based models, 820M is too weak to run them.

Back to your question, Windows 10 OS with python3.9 will work. BasicTS itself has no operating system requirements. However, installing nvidia-driver and pytorch on Windows OS may require additional work compared to installing on Linux (e.g. Ubuntu).

faizanhakim commented 7 months ago

Hey I tried running it from another pc with nvidia rtx 3070 and this is the error that shows up:

PS D:\My Work\Fast Semester 7\Applied Machine Learning\ProjectNew\BasicTS-master\BasicTS-master> python experiments/inference.py -m DLinear -d PEMS08 -g "0"
2023-12-05 23:48:56,483 - easytorch-launcher - INFO - Launching EasyTorch runner. DESCRIPTION: DLinear model configuration RUNNER: <class 'basicts.runners.runner_zoo.simple_tsf_runner.SimpleTimeSeriesForecastingRunner'> DATASET_CLS: <class 'basicts.data.dataset.TimeSeriesForecastingDataset'> DATASET_NAME: PEMS08 DATASET_TYPE: Traffic Flow DATASET_INPUT_LEN: 12 DATASET_OUTPUT_LEN: 12 GPU_NUM: 1 NULL_VAL: 0.0 ENV: SEED: 1 CUDNN: ENABLED: True MODEL: NAME: DLinear ARCH: <class 'baselines.DLinear.arch.dlinear.DLinear'> PARAM: seq_len: 12 pred_len: 12 individual: False enc_in: 170 FORWARD_FEATURES: [0] TARGET_FEATURES: [0] TRAIN: LOSS: masked_mae OPTIM: TYPE: Adam PARAM: lr: 0.002 weight_decay: 0.0001 LR_SCHEDULER: TYPE: MultiStepLR PARAM: milestones: [1, 25] gamma: 0.5 NUM_EPOCHS: 100 CKPT_SAVE_DIR: checkpoints\DLinear_100 DATA: DIR: datasets/PEMS08 BATCH_SIZE: 64 PREFETCH: False SHUFFLE: True NUM_WORKERS: 2 PIN_MEMORY: False VAL: INTERVAL: 1 DATA: DIR: datasets/PEMS08 BATCH_SIZE: 64 PREFETCH: False SHUFFLE: False NUM_WORKERS: 2 PIN_MEMORY: False TEST: INTERVAL: 1 DATA: DIR: datasets/PEMS08 BATCH_SIZE: 64 PREFETCH: False SHUFFLE: False NUM_WORKERS: 2 PIN_MEMORY: False EVAL: USE_GPU: False HORIZONS: [12]

2023-12-05 23:48:56,493 - easytorch-env - INFO - Use devices 0. 2023-12-05 23:48:56,493 - easytorch-env - INFO - Disable TF32 mode 2023-12-05 23:48:56,499 - easytorch - INFO - Set ckpt save dir: 'checkpoints\DLinear_100\e3c9d1ca4d372a70afb22e2326370b81' 2023-12-05 23:48:56,499 - easytorch - INFO - Building model. test len: 3566 2023-12-05 23:49:00,891 - easytorch-inference - INFO - Loading Checkpoint from 'ckpt/DLinear/PEMS08/DLinear_best_val_MAE.pt' Traceback (most recent call last): File "D:\Python 3.9\lib\site-packages\easytorch\core\runner.py", line 289, in load_model checkpoint_dict = load_ckpt(self.ckpt_save_dir, ckpt_path=ckpt_path, logger=self.logger) File "D:\Python 3.9\lib\site-packages\easytorch\core\checkpoint.py", line 51, in load_ckpt return torch.load(ckpt_path, map_location=lambda storage, loc: to_device(storage)) File "D:\Python 3.9\lib\site-packages\torch\serialization.py", line 594, in load with _open_file_like(f, 'rb') as opened_file: File "D:\Python 3.9\lib\site-packages\torch\serialization.py", line 230, in _open_file_like return _open_file(name_or_buffer, mode) File "D:\Python 3.9\lib\site-packages\torch\serialization.py", line 211, in init super(_open_file, self).init(open(name, mode)) FileNotFoundError: [Errno 2] No such file or directory: 'ckpt/DLinear/PEMS08/DLinear_best_val_MAE.pt'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "D:\My Work\Fast Semester 7\Applied Machine Learning\ProjectNew\BasicTS-master\BasicTS-master\experiments\inference.py", line 44, in launch_runner(cfg_path, inference, (ckpt_path, args.batch_size), devices=args.gpus) File "D:\My Work\Fast Semester 7\Applied Machine Learning\ProjectNew\BasicTS-master\BasicTS-master\basicts\launcher.py", line 10, in launch_runner easytorch.launch_runner(cfg=cfg, fn=fn, args=args, device_type=device_type, devices=devices) File "D:\Python 3.9\lib\site-packages\easytorch\launcher\launcher.py", line 116, in launch_runner fn(cfg, runner, *args) File "D:\My Work\Fast Semester 7\Applied Machine Learning\ProjectNew\BasicTS-master\BasicTS-master\experiments\inference.py", line 18, in inference runner.load_model(ckpt_path=ckpt) File "D:\Python 3.9\lib\site-packages\easytorch\core\runner.py", line 295, in load_model raise OSError('Ckpt file does not exist') from e OSError: Ckpt file does not exist PS D:\My Work\Fast Semester 7\Applied Machine Learning\ProjectNew\BasicTS-master\BasicTS-master>

Any idea what this is about?

faizanhakim commented 7 months ago

image

I had to make this directory 'ckpt/DLinear/PEMS08/DLinear_best_val_MAE.pt' manually. Is this the type of output that I should expect, and if so is there any script or notebook in the project that I can use to visualize the results.

zezhishao commented 7 months ago

ckpt/DLinear/PEMS08/DLinear_best_val_MAE.pt is the checkpoint of the DLinear model, and it saves the weights of the DLinear model. If you want to visualize your predictions and the ground truth, you should modify the test function in runners.base_tsf_runner.py. Specifically, you can save the prediction and real_value before this line. They are both a tensor of shape [B, L, N, C], where B is the number of samples, L is the prediction length, N is the number of time series (i.e., variables), C=1 is the target value.

faizanhakim commented 6 months ago

Another thing I would like to ask is that I trained the Cross-former model on the PEMS08 dataset and when I run the inference code all test results are 0.0000. Is there any error in the model or this can be considered as a legitimate result?

image

zezhishao commented 6 months ago

I can't reproduce this error. Can you provide me with more information? For example, what are your modifications and configurations?

image
faizanhakim commented 6 months ago

This is my configuration file for PEMS08


import os
import sys

# TODO: remove it when basicts can be installed by pip
sys.path.append(os.path.abspath(__file__ + "/../../.."))
from easydict import EasyDict
from basicts.losses import masked_mae, masked_mse
from basicts.data import TimeSeriesForecastingDataset
from basicts.runners import SimpleTimeSeriesForecastingRunner

from .arch import Crossformer

CFG = EasyDict()

# ================= general ================= #
CFG.DESCRIPTION = "Crossformer model configuration"
CFG.RUNNER = SimpleTimeSeriesForecastingRunner
CFG.DATASET_CLS = TimeSeriesForecastingDataset
CFG.DATASET_NAME = "PEMS08"
CFG.DATASET_TYPE = "Traffic Flow"
CFG.DATASET_INPUT_LEN = 12
CFG.DATASET_OUTPUT_LEN = 12
CFG.GPU_NUM = 1
CFG.NULL_VAL = 0.0

# ================= environment ================= #
CFG.ENV = EasyDict()
CFG.ENV.SEED = 0
CFG.ENV.CUDNN = EasyDict()
CFG.ENV.CUDNN.ENABLED = True

# ================= model ================= #
CFG.MODEL = EasyDict()
CFG.MODEL.NAME = "Crossformer"
CFG.MODEL.ARCH = Crossformer
NUM_NODES = 170
CFG.MODEL.PARAM = {
    "data_dim": NUM_NODES,
    "in_len": CFG.DATASET_INPUT_LEN,
    "out_len": CFG.DATASET_OUTPUT_LEN,
    "seg_len": 24,
    "win_size": 2,
    # default parameters
    "factor": 10,
    "d_model": 256,
    "d_ff": 512,
    "n_heads": 4,
    "e_layers": 3,
    "dropout": 0.2,
    "baseline": False
}
CFG.MODEL.FORWARD_FEATURES = [0]
CFG.MODEL.TARGET_FEATURES = [0]

# ================= optim ================= #
CFG.TRAIN = EasyDict()
CFG.TRAIN.LOSS = masked_mae
CFG.TRAIN.OPTIM = EasyDict()
CFG.TRAIN.OPTIM.TYPE = "Adam"
CFG.TRAIN.OPTIM.PARAM = {
    "lr": 0.0002,
    "weight_decay": 0.0005,
}
CFG.TRAIN.LR_SCHEDULER = EasyDict()
CFG.TRAIN.LR_SCHEDULER.TYPE = "MultiStepLR"
CFG.TRAIN.LR_SCHEDULER.PARAM = {
    "milestones": [1, 5],
    "gamma": 0.5
}

# ================= train ================= #
CFG.TRAIN.NUM_EPOCHS = 50
CFG.TRAIN.CKPT_SAVE_DIR = os.path.join(
    'checkpoints',
    '_'.join([CFG.MODEL.NAME, str(CFG.TRAIN.NUM_EPOCHS)])
)
# train data
CFG.TRAIN.DATA = EasyDict()
# read data
CFG.TRAIN.DATA.DIR = 'datasets/' + CFG.DATASET_NAME
# dataloader args, optional
CFG.TRAIN.DATA.BATCH_SIZE = 8
CFG.TRAIN.DATA.PREFETCH = False
CFG.TRAIN.DATA.SHUFFLE = True
CFG.TRAIN.DATA.NUM_WORKERS = 2
CFG.TRAIN.DATA.PIN_MEMORY = False

# ================= validate ================= #
CFG.VAL = EasyDict()
CFG.VAL.INTERVAL = 1
# validating data
CFG.VAL.DATA = EasyDict()
# read data
CFG.VAL.DATA.DIR = 'datasets/' + CFG.DATASET_NAME
# dataloader args, optional
CFG.VAL.DATA.BATCH_SIZE = 64
CFG.VAL.DATA.PREFETCH = False
CFG.VAL.DATA.SHUFFLE = False
CFG.VAL.DATA.NUM_WORKERS = 2
CFG.VAL.DATA.PIN_MEMORY = False

# ================= test ================= #
CFG.TEST = EasyDict()
CFG.TEST.INTERVAL = 1
# test data
CFG.TEST.DATA = EasyDict()
# read data
CFG.TEST.DATA.DIR = 'datasets/' + CFG.DATASET_NAME
# dataloader args, optional
CFG.TEST.DATA.BATCH_SIZE = 64
CFG.TEST.DATA.PREFETCH = False
CFG.TEST.DATA.SHUFFLE = False
CFG.TEST.DATA.NUM_WORKERS = 2
CFG.TEST.DATA.PIN_MEMORY = False
1
# ================= evaluate ================= #
CFG.EVAL = EasyDict()
CFG.EVAL.USE_GPU = True
CFG.EVAL.HORIZONS = [12]
zezhishao commented 6 months ago

Have you read Crossformer [1]? Crossformer is proposed for long-term time series forecasting, which usually requires long-term historical data, such as 336 time steps. Furthermore, the hyper-parameter of the model, CFG.MODEL.PARAM, must be set appropriately based on your input sequence length, especially the seq_len and win_size parameters.

If you need more information about Crossformer, see [1]. Also, you may can refer to our paper [2] to get some help.

[1] Crossformer: Transformer Utilizing Cross-Dimension Dependency for Multivariate Time Series Forecasting. https://openreview.net/pdf?id=vSVLM2j9eie [2] Exploring Progress in Multivariate Time Series Forecasting: Comprehensive Benchmarking and Heterogeneity Analysis. https://arxiv.org/pdf/2310.06119.pdf

faizanhakim commented 6 months ago

could you share me your configuration so that I can have it as a reference?

zezhishao commented 6 months ago

The configuration in this repo is appropriate. Kindly note that Crossformer is designed for long-term time series modeling, and does not naturally support short term inputs, e.g. 12 time steps.

faizanhakim commented 6 months ago

Are you utilizing full long version of datasets?

zezhishao commented 6 months ago

What do you mean by "full" and "long"? There are no "long" and "short" versions of the dataset itself. We generate samples from time series based on a sliding window of $T=P+F$, where $P$ is the length of historical data and the $F$ is the length of future data. The $P$ and $F$ is a human-adjustable hyperparameter, see this script. For example, PEMS08 can be used in both long and short settings, as described in our paper [1].

[1] Exploring advances in multivariate time series forecasting: Comprehensive benchmarking and heterogeneity analysis. https://arxiv.org/pdf/2310.06119.pdf

faizanhakim commented 6 months ago

Sometimes with a few epochs of training, I get all errors to be zero with the configuration given. Anything I can do to avoid this?

image

zezhishao commented 6 months ago

There could be many reasons and I need more information to reproduce your error.