Closed m6129 closed 5 months ago
Hi, it seems this is an out of memory error. To solve this, I think you could either reduce the lm_layer_num, or remove the electricity dataset in the execute_list/train_all.csv file. Because the electricity dataset consumes a lot of GPU memory. Thanks.
Thanks. Del electricity and --lm_layer_num 4 helped. took up 14 GB of RAM GPU P100 on kaggle
Dear developer, could you please provide guidance on how to run your model in Kaggle? https://www.kaggle.com/code/mrantonzaitsev/unitime
have this error:
Traceback (most recent call last): File "/kaggle/working/UniTime/run.py", line 67, in
engine.train()
File "/kaggle/working/UniTime/engines/engine.py", line 133, in train
loss = self.train_engines[idx].train_batch(batch, self.model, self.optimizer)
File "/kaggle/working/UniTime/engines/engine_forecasting.py", line 31, in train_batch
outputs = model(self.info, inp, mask)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, kwargs)
File "/kaggle/working/UniTime/models/unitime.py", line 137, in forward
x_enc = self.backbone(inputs_embeds=inputs_embeds)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, *kwargs)
File "/kaggle/working/UniTime/models/unitimegpt2.py", line 38, in forward
transformer_outputs = self.transformer(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(input, kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 900, in forward
outputs = block(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, *kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 391, in forward
attn_outputs = self.attn(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(input, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 332, in forward
attn_output, attn_weights = self._attn(query, key, value, attention_mask, head_mask)
File "/opt/conda/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 184, in _attn
attn_weights = torch.matmul(query, key.transpose(-1, -2))
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 316.00 MiB (GPU 0; 15.89 GiB total capacity; 14.81 GiB already allocated; 82.12 MiB free; 15.08 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF