Как запустить модель в Kaggle

m6129 commented 6 months ago

Dear developer, could you please provide guidance on how to run your model in Kaggle? https://www.kaggle.com/code/mrantonzaitsev/unitime

have this error:

Traceback (most recent call last): File "/kaggle/working/UniTime/run.py", line 67, in engine.train() File "/kaggle/working/UniTime/engines/engine.py", line 133, in train loss = self.train_engines[idx].train_batch(batch, self.model, self.optimizer) File "/kaggle/working/UniTime/engines/engine_forecasting.py", line 31, in train_batch outputs = model(self.info, inp, mask) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "/kaggle/working/UniTime/models/unitime.py", line 137, in forward x_enc = self.backbone(inputs_embeds=inputs_embeds) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "/kaggle/working/UniTime/models/unitimegpt2.py", line 38, in forward transformer_outputs = self.transformer( File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "/opt/conda/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 900, in forward outputs = block( File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "/opt/conda/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 391, in forward attn_outputs = self.attn( File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(input, **kwargs) File "/opt/conda/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 332, in forward attn_output, attn_weights = self._attn(query, key, value, attention_mask, head_mask) File "/opt/conda/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 184, in _attn attn_weights = torch.matmul(query, key.transpose(-1, -2)) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 316.00 MiB (GPU 0; 15.89 GiB total capacity; 14.81 GiB already allocated; 82.12 MiB free; 15.08 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

liuxu77 commented 5 months ago

Hi, it seems this is an out of memory error. To solve this, I think you could either reduce the lm_layer_num, or remove the electricity dataset in the execute_list/train_all.csv file. Because the electricity dataset consumes a lot of GPU memory. Thanks.

m6129 commented 5 months ago

Thanks. Del electricity and --lm_layer_num 4 helped. took up 14 GB of RAM GPU P100 on kaggle

liuxu77 / UniTime

Как запустить модель в Kaggle #1