Closed hlynurd closed 2 years ago
Hello! Could you show the script that you're using? Have you tried using accelerate
or the Trainer
, and does it fix any issue? Thank you, cc @sgugger
Here's my script: https://gist.github.com/hlynurd/d9b43edbb1b318e666ff875258130bb5 I get the same problem if I adapt it for accelerate
And is the problem specific to ConvBert or do you have the same issue for all other models?
I only get the problem for ConvBert and only on 8 TPU cores. I have tried Electra and RoBERTa and both work well for 1 and 8 cores.
Sounds like a specific problem in convBERT then. Not sure anyone on the team will have time to investigate in depth in the coming weeks however, but if you manage to find the cause, please let us now.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
This issue persists for me.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
This issue persists for me.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Environment info
Python 3.7.3 torch==1.9.1 torch-xla @ https://storage.googleapis.com/tpu-pytorch/wheels/torch_xla-1.9-cp37-cp37m-linux_x86_64.whl transformers==4.12.3
Models:
Information
Hi all,
I would like to use ConvBertForQuestionAnswering on 8x tpu cores using pytorch/xla. It works for me on a single core, and changing the ConvBert model creation to Electra or Roberta works fine on both 1x and 8x cores.
This hangs for me when nprocs=8 but not when nprocs=1. It stops at a forward pass of the model
self.backbone = ConvBertForQuestionAnswering.from_pretrained(model_path, config=config) outputs = self.backbone.convbert(input_ids, attention_mask, token_type_ids)