kamalkraj / BERT-NER

Pytorch-Named-Entity-Recognition-with-BERT
GNU Affero General Public License v3.0
1.21k stars 278 forks source link

Model training does not work on CPU #99

Open saurabhhssaurabh opened 3 years ago

saurabhhssaurabh commented 3 years ago

I have cloned code from dev branch and executing following command to fine-tune model on CPU: python run_ner.py --cache_dir=path_to_cache --data_dir=path_to_data --bert_model=bert-base-uncased --task_name=ner --output_dir=path_to_output --no_cuda --do_train --do_eval --warmup_proportion=0.1

But I am facing the following error: Traceback (most recent call last): File "run_ner.py", line 611, in main() File "run_ner.py", line 503, in main loss = model(input_ids, segment_ids, input_mask, label_ids,valid_ids,l_mask) File "/home/dev01/python_3/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, *kwargs) File "run_ner.py", line 43, in forward logits = self.classifier(sequence_output) File "/home/dev01/python_3/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, **kwargs) File "/home/dev01/python_3/lib64/python3.6/site-packages/torch/nn/modules/linear.py", line 93, in forward return F.linear(input, self.weight, self.bias) File "/home/dev01/python_3/lib64/python3.6/site-packages/torch/nn/functional.py", line 1692, in linear output = input.matmul(weight.t()) RuntimeError: Tensor for argument #3 'mat2' is on CPU, but expected it to be on GPU (while checking arguments for addmm)

I am not getting when I am passing CPU flag, why is it expecting a tensor to be on GPU?

brandonrobertz commented 2 years ago

--no_cuda has an error with the NER task, because the device can still be set to GPU here:

class Ner(BertForTokenClassification):

    def forward(self, input_ids,
                token_type_ids=None,
                attention_mask=None,
                labels=None,
                valid_ids=None,
                attention_mask_label=None):
        # ... skipping to line 47
        valid_output = torch.zeros(batch_size,
                                   max_len,
                                   feat_dim,
                                   dtype=torch.float32,
                                   device='gpu')

I changed the default device arg to cpu when I wasn't using CUDA and everything worked as expected.