Hi, trying to reproduce results from the paper and running into a seemingly trivial error. Would appreciate any help.
I run the following with no issues:
It seems as though total_samples = 0somehow. This is the full error (I added the print statement to print total_samples):
UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 803: system has unsupported display driver / cuda driver combination (Triggered internally at /pytorch/c10/cuda/CUDAFunctions.cpp:100.)
return torch._C._cuda_getDeviceCount() > 0
2023-02-21 19:03:43,404 | INFO : Using device: cpu
2023-02-21 19:03:43,404 | INFO : Loading and preprocessing data ...
66it [00:00, 136.70it/s]
2023-02-21 19:03:43,998 | INFO : 33 samples may be used for training
2023-02-21 19:03:43,998 | INFO : 9 samples will be used for validation
2023-02-21 19:03:43,998 | INFO : 0 samples will be used for testing
2023-02-21 19:03:44,003 | INFO : Creating model ...
2023-02-21 19:03:44,006 | INFO : Model:
TSTransformerEncoderClassiregressor(
(project_inp): Linear(in_features=24, out_features=128, bias=True)
(pos_enc): FixedPositionalEncoding(
(dropout): Dropout(p=0.1, inplace=False)
)
(transformer_encoder): TransformerEncoder(
(layers): ModuleList(
(0): TransformerBatchNormEncoderLayer(
(self_attn): MultiheadAttention(
(out_proj): _LinearWithBias(in_features=128, out_features=128, bias=True)
)
(linear1): Linear(in_features=128, out_features=512, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
(linear2): Linear(in_features=512, out_features=128, bias=True)
(norm1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(norm2): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(dropout1): Dropout(p=0.1, inplace=False)
(dropout2): Dropout(p=0.1, inplace=False)
)
(1): TransformerBatchNormEncoderLayer(
(self_attn): MultiheadAttention(
(out_proj): _LinearWithBias(in_features=128, out_features=128, bias=True)
)
(linear1): Linear(in_features=128, out_features=512, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
(linear2): Linear(in_features=512, out_features=128, bias=True)
(norm1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(norm2): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(dropout1): Dropout(p=0.1, inplace=False)
(dropout2): Dropout(p=0.1, inplace=False)
)
(2): TransformerBatchNormEncoderLayer(
(self_attn): MultiheadAttention(
(out_proj): _LinearWithBias(in_features=128, out_features=128, bias=True)
)
(linear1): Linear(in_features=128, out_features=512, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
(linear2): Linear(in_features=512, out_features=128, bias=True)
(norm1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(norm2): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(dropout1): Dropout(p=0.1, inplace=False)
(dropout2): Dropout(p=0.1, inplace=False)
)
)
)
(dropout1): Dropout(p=0.1, inplace=False)
(output_layer): Linear(in_features=18432, out_features=1, bias=True)
)
2023-02-21 19:03:44,006 | INFO : Total number of parameters: 616449
2023-02-21 19:03:44,006 | INFO : Trainable parameters: 616449
Loaded model from experiments/finetuned_2023-02-21_18-40-55_2J1/checkpoints/model_best.pth. Epoch: 188
total_samples: 0
Traceback (most recent call last):
File "src/main.py", line 307, in <module>
main(config)
File "src/main.py", line 196, in main
aggr_metrics_test, per_batch_test = test_evaluator.evaluate(keep_all=True)
File "/mvts_transformer/src/running.py", line 471, in evaluate
epoch_loss = epoch_loss / total_samples # average loss per element for whole epoch
ZeroDivisionError: division by zero
Fwiw, this is the path to the test file and it is populated with data:
datasets/Multivariate2018_ts/Multivariate_ts/SpokenArabicDigits/SpokenArabicDigits_TEST.ts
EDIT: Solved, silly type --pattern should be --test_pattern
Hi, trying to reproduce results from the paper and running into a seemingly trivial error. Would appreciate any help. I run the following with no issues:
When I try to run:
It seems as though
total_samples = 0
somehow. This is the full error (I added the print statement to print total_samples):Fwiw, this is the path to the test file and it is populated with data:
datasets/Multivariate2018_ts/Multivariate_ts/SpokenArabicDigits/SpokenArabicDigits_TEST.ts
EDIT: Solved, silly type --pattern should be --test_pattern