utterworks / fast-bert

Super easy library for BERT based NLP models
Apache License 2.0
1.85k stars 342 forks source link

RuntimeError: [enforce fail at CPUAllocator.cpp:64] . DefaultCPUAllocator: can't allocate memory #234

Open veilupt opened 4 years ago

veilupt commented 4 years ago

@kaushaltrivedi Cannot allocate memory error

Error Logs:

06/23/2020 03:00:43 - INFO - root - Num examples = 1000 06/23/2020 03:00:43 - INFO - root - Num Epochs = 6 06/23/2020 03:00:43 - INFO - root - Total train batch size (w. parallel, distributed & accumulation) = 16 06/23/2020 03:00:43 - INFO - root - Gradient Accumulation steps = 1 06/23/2020 03:00:43 - INFO - root - Total optimization steps = 378 Traceback (most recent call last):----------------------------------| 0.00% [0/63 00:00<00:00] File "bert/run.py", line 141, in <module> learner.fit(args.num_train_epochs, args.learning_rate, validate=True) File "/home/pt4_gcp/.local/lib/python3.7/site-packages/fast_bert/learner_cls.py", line 397, in fit outputs = self.model(**inputs) File "/home/pt4_gcp/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__ result = self.forward(*input, **kwargs) File "/home/pt4_gcp/.local/lib/python3.7/site-packages/fast_bert/modeling.py", line 191, in forward head_mask=head_mask, File "/home/pt4_gcp/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__ result = self.forward(*input, **kwargs) File "/home/pt4_gcp/.local/lib/python3.7/site-packages/transformers/modeling_bert.py", line 734, in forward encoder_attention_mask=encoder_extended_attention_mask, File "/home/pt4_gcp/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__ result = self.forward(*input, **kwargs) File "/home/pt4_gcp/.local/lib/python3.7/site-packages/transformers/modeling_bert.py", line 408, in forward hidden_states, attention_mask, head_mask[i], encoder_hidden_states, encoder_attention_mask File "/home/pt4_gcp/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__ result = self.forward(*input, **kwargs) File "/home/pt4_gcp/.local/lib/python3.7/site-packages/transformers/modeling_bert.py", line 369, in forward self_attention_outputs = self.attention(hidden_states, attention_mask, head_mask) File "/home/pt4_gcp/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__ result = self.forward(*input, **kwargs) File "/home/pt4_gcp/.local/lib/python3.7/site-packages/transformers/modeling_bert.py", line 315, in forward hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask File "/home/pt4_gcp/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__ result = self.forward(*input, **kwargs) File "/home/pt4_gcp/.local/lib/python3.7/site-packages/transformers/modeling_bert.py", line 246, in forward attention_probs = self.dropout(attention_probs) File "/home/pt4_gcp/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__ result = self.forward(*input, **kwargs) File "/home/pt4_gcp/.local/lib/python3.7/site-packages/torch/nn/modules/dropout.py", line 54, in forward return F.dropout(input, self.p, self.training, self.inplace) File "/home/pt4_gcp/.local/lib/python3.7/site-packages/torch/nn/functional.py", line 936, in dropout else _VF.dropout(input, p, training)) RuntimeError: [enforce fail at CPUAllocator.cpp:64] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 201326592 bytes. Error code 12 (Cannot allocate memory)

I choose n1-standard-4 (4 vCPUs, 15 GB memory) machine type.

Note: I have been setting torch device as 'cpu'

How do I run multi label classification by passing the RAM size?

veilupt commented 4 years ago

@kaushaltrivedi It's not the memory issue, How to resolve this error?

While increasing CPU with RAM size, target size also getting increasing,

I choose n1-standard-4 (6 vCPUs, 26 GB memory) machine type.

Error log:

Traceback (most recent call last):-------------------------------------------------------------| 0.00% [0/63 00:00<00:00]
  File "bert/run.py", line 146, in <module>
    learner.fit(args.num_train_epochs, args.learning_rate, validate=True)
  File "/home/pt4_gcp/.local/lib/python3.7/site-packages/fast_bert/learner_cls.py", line 397, in fit
    outputs = self.model(**inputs)
  File "/home/pt4_gcp/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/pt4_gcp/.local/lib/python3.7/site-packages/fast_bert/modeling.py", line 205, in forward
    logits.view(-1, self.num_labels), labels.view(-1, self.num_labels)
  File "/home/pt4_gcp/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/pt4_gcp/.local/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 617, in forward
    reduction=self.reduction)
  File "/home/pt4_gcp/.local/lib/python3.7/site-packages/torch/nn/functional.py", line 2433, in binary_cross_entropy_with_logits
    raise ValueError("Target size ({}) must be the same as input size ({})".format(target.size(), input.size()))
ValueError: Target size (torch.Size([64, 3])) must be the same as input size (torch.Size([32, 3]))
veilupt commented 4 years ago

@kaushaltrivedi Any update on this?