lonePatient / BERT-NER-Pytorch

Chinese NER(Named Entity Recognition) using BERT(Softmax, CRF, Span)
MIT License
2.06k stars 424 forks source link

多gpu情况下的crf函数报错 #7

Closed aliendaniel closed 4 years ago

aliendaniel commented 4 years ago

02/25/2020 13:50:42 - INFO - root - Running evaluation 02/25/2020 13:50:42 - INFO - root - Num examples = 1343 02/25/2020 13:50:42 - INFO - root - Batch size = 48 Traceback (most recent call last): File "run_ner_crf.py", line 517, in main() File "run_ner_crf.py", line 459, in main global_step, tr_loss = train(args, train_dataset, model, tokenizer) File "run_ner_crf.py", line 148, in train evaluate(args, model, tokenizer) File "run_nercrf.py", line 197, in evaluate tags, = model.crf._obtain_labels(logits, args.id2label, inputs['input_lens']) File "/root/.pyenv/versions/3.7.2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 591, in getattr type(self).name, name)) AttributeError: 'DataParallel' object has no attribute 'crf'

经过排查,crf函数是自定义的,在多gpu的情况下,对model进行了DataParallel处理,DataParallel里面没有这个自定义的crf函数产生的。

lonePatient commented 4 years ago

@aliendaniel 如果要使用多gpu的话,其实修改很简单,训练时可以支持多gpu的,但是eval是不支持的,那么当使用多GPU的时候,加一句代码即可:

    if isinstance(model, nn.DataParallel):
        model = model.module
    for step, batch in enumerate(eval_dataloader):
rxc205 commented 2 years ago

请问,这边如何才能在eval的时候支持多GPU运行呢?