thuml / Time-Series-Library

A Library for Advanced Deep Time Series Models.
MIT License
6.75k stars 1.06k forks source link

Can't run scripts with GPU-Support #163

Closed gabriead closed 1 year ago

gabriead commented 1 year ago

Hi Time-Series-Library-Team,

I tried to run the bash ./scripts/long_term_forecast/ETT_script/TimesNet_ETTh1.sh with GPU support and it failed. I have created a conda env with all the requirements specified in your requirements.txt. But the run.py it won't use my GPU. When I execute torch.cuda.is_available() seperately on that machine and env I get True. So I don't know why through the use of bash file it is not supported anymore all of a sudden. Could you please support me with that issue!

This is my system setup. `Collecting environment information... PyTorch version: 1.7.1 Is debug build: False CUDA used to build PyTorch: 10.2 ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.6 LTS (x86_64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 Clang version: Could not collect CMake version: version 3.15.5

Python version: 3.9 (64-bit runtime) Is CUDA available: True CUDA runtime version: Could not collect GPU models and configuration: GPU 0: NVIDIA A100 80GB PCIe Nvidia driver version: 470.182.03 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.2.4 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.2.4 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.2.4 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.2.4 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.2.4 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.2.4 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.2.4 HIP runtime version: N/A MIOpen runtime version: N/A

Versions of relevant libraries: [pip3] numpy==1.23.5 [pip3] reformer-pytorch==1.4.4 [pip3] torch==1.7.1 [conda] numpy 1.23.5 pypi_0 pypi [conda] reformer-pytorch 1.4.4 pypi_0 pypi [conda] torch 1.7.1 pypi_0 pypi`

But in the logs it is not using the GPU:

`Args in experiment: Namespace(task_name='long_term_forecast', is_training=1, model_id='ETTh1_96_96', model='TimesNet', data='ETTh1', root_path='./dataset/ETT-small/', data_path='ETTh1.csv', features='M', target='OT', freq='h', checkpoints='./checkpoints/', seq_len=96, label_len=48, pred_len=96, seasonal_patterns='Monthly', mask_rate=0.25, anomaly_ratio=0.25, top_k=5, num_kernels=6, enc_in=7, dec_in=7, c_out=7, d_model=16, n_heads=8, e_layers=2, d_layers=1, d_ff=32, moving_avg=25, factor=3, distil=True, dropout=0.1, embed='timeF', activation='gelu', output_attention=False, num_workers=10, itr=1, train_epochs=10, batch_size=32, patience=3, learning_rate=0.0001, des='Exp', loss='MSE', lradj='type1', use_amp=False, use_gpu=False, gpu=0, use_multi_gpu=False, devices='0,1,2,3', p_hidden_dims=[128, 128], p_hidden_layers=2) Use CPU

start training : long_term_forecast_ETTh1_96_96_TimesNet_ETTh1_ftM_sl96_ll48_pl96_dm16_nh8_el2_dl1_df32_fc3_ebtimeF_dtTrue_Exp_0>>>>>>>>>>>>>>>>>>>>>>>>>> train 8449 val 2785 test 2785`

wuhaixu2016 commented 1 year ago

Hi, you need to ensure the running environment is consistent to the environment that you run torch.cuda.is_available()

yfprime commented 7 months ago

Hi Time-Series-Library-Team,

I tried to run the bash ./scripts/long_term_forecast/ETT_script/TimesNet_ETTh1.sh with GPU support and it failed. I have created a conda env with all the requirements specified in your requirements.txt. But the run.py it won't use my GPU. When I execute torch.cuda.is_available() seperately on that machine and env I get True. So I don't know why through the use of bash file it is not supported anymore all of a sudden. Could you please support me with that issue!

This is my system setup. `Collecting environment information... PyTorch version: 1.7.1 Is debug build: False CUDA used to build PyTorch: 10.2 ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.6 LTS (x86_64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 Clang version: Could not collect CMake version: version 3.15.5

Python version: 3.9 (64-bit runtime) Is CUDA available: True CUDA runtime version: Could not collect GPU models and configuration: GPU 0: NVIDIA A100 80GB PCIe Nvidia driver version: 470.182.03 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.2.4 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.2.4 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.2.4 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.2.4 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.2.4 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.2.4 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.2.4 HIP runtime version: N/A MIOpen runtime version: N/A

Versions of relevant libraries: [pip3] numpy==1.23.5 [pip3] reformer-pytorch==1.4.4 [pip3] torch==1.7.1 [conda] numpy 1.23.5 pypi_0 pypi [conda] reformer-pytorch 1.4.4 pypi_0 pypi [conda] torch 1.7.1 pypi_0 pypi`

But in the logs it is not using the GPU:

`Args in experiment: Namespace(task_name='long_term_forecast', is_training=1, model_id='ETTh1_96_96', model='TimesNet', data='ETTh1', root_path='./dataset/ETT-small/', data_path='ETTh1.csv', features='M', target='OT', freq='h', checkpoints='./checkpoints/', seq_len=96, label_len=48, pred_len=96, seasonal_patterns='Monthly', mask_rate=0.25, anomaly_ratio=0.25, top_k=5, num_kernels=6, enc_in=7, dec_in=7, c_out=7, d_model=16, n_heads=8, e_layers=2, d_layers=1, d_ff=32, moving_avg=25, factor=3, distil=True, dropout=0.1, embed='timeF', activation='gelu', output_attention=False, num_workers=10, itr=1, train_epochs=10, batch_size=32, patience=3, learning_rate=0.0001, des='Exp', loss='MSE', lradj='type1', use_amp=False, use_gpu=False, gpu=0, use_multi_gpu=False, devices='0,1,2,3', p_hidden_dims=[128, 128], p_hidden_layers=2) Use CPU

start training : long_term_forecast_ETTh1_96_96_TimesNet_ETTh1_ftM_sl96_ll48_pl96_dm16_nh8_el2_dl1_df32_fc3_ebtimeF_dtTrue_Exp_0>>>>>>>>>>>>>>>>>>>>>>>>>> train 8449 val 2785 test 2785`

In the shell script, ensure that the selected CUDA device is 0 instead of 1 if you have just one GPU.