The author's code is based on torch 0.4.1, however many people may have GPUs no longer supported by cuda < 11 and have to use some more recent versions like torch 1.8
If you use cuda < 11 you would run into the following error:
RuntimeError: cublas runtime error : the GPU program failed to execute at /opt/conda/conda-bld/pytorch_1533672544752/work/aten/src/THC/THCBlas.cu:411
If you use the correct cuda version, then the StopIteration error would appear if you use multiple gpus. I believe that this issue has been raised since torch 1.5, see https://github.com/huggingface/transformers/issues/3936
To stop the bug by hand, just correct the following line in pytorch_pretrained_bert/modeling.py
The author's code is based on torch 0.4.1, however many people may have GPUs no longer supported by cuda < 11 and have to use some more recent versions like torch 1.8
If you use cuda < 11 you would run into the following error:
If you use the correct cuda version, then the StopIteration error would appear if you use multiple gpus. I believe that this issue has been raised since torch 1.5, see https://github.com/huggingface/transformers/issues/3936
To stop the bug by hand, just correct the following line in
pytorch_pretrained_bert/modeling.py
extended_attention_mask = extended_attention_mask.to(dtype=next(self.parameters()).dtype) # fp16 compatibility
to
extended_attention_mask = extended_attention_mask.to(dtype=torch.float32) # fp16 compatibility
I won't make a pull request coz I don't know what's the impact of this change on torch < 1.5.