graphcore / poptorch

PyTorch interface for the IPU
https://docs.graphcore.ai/projects/poptorch-user-guide/en/latest/
MIT License
176 stars 14 forks source link

Failing on training; trainer.train() #12

Open rohullaa opened 1 year ago

rohullaa commented 1 year ago

Hey,

I am trying to run a simple text classification on IPUs with PopTorch and Optimum. When I initilize the training by;

trainer.train()

I get the following error:

Traceback (most recent call last):
  File "IPUs/train.py", line 117, in <module>
    trainer.train()
  File "FOLDER/env/lib/python3.6/site-packages/optimum/graphcore/trainer.py", line 904, in train
    self._compile_model(model, next(iter(train_dataloader)), log=True)
  File "FOLDER/env/lib/python3.6/site-packages/optimum/graphcore/trainer.py", line 375, in _compile_model
    model.compile(**sample_batch)
  File "FOLDER/env/lib/python3.6/site-packages/poptorch/_poplar_executor.py", line 651, in compile
    self._compile(in_tensors)
  File "FOLDER(env/lib/python3.6/site-packages/poptorch/_impl.py", line 259, in wrapper
    return func(self, *args, **kwargs)
  File "FOLDER/env/lib/python3.6/site-packages/poptorch/_poplar_executor.py", line 569, in _compile
    self._executable = poptorch_core.compileWithTrace(*trace_args)
poptorch.poptorch_core.Error: In poptorch/python/poptorch.cpp:1371: 'std::out_of_range': basic_string::replace: __pos (which is 5) > this->size() (which is 0)
Error raised in:
  [0] Compiler::initSession
  [1] LowerToPopart::compile
  [2] compileWithTrace

Can someone please help me with this error?

payoto commented 1 year ago

Hi @rohullaa, sorry for not seeing this earlier, optimum and transformers do not support python 3.6 anymore so the error might be related to that. If you still encounter it on Python 3.8 I would need to know which model from Optimum you are trying to run when you get the failure.