huggingface / optimum-graphcore

Blazing fast training of 🤗 Transformers on Graphcore IPUs
Apache License 2.0
81 stars 33 forks source link

T5 Translation example will not run without replication factor set (even though I set it to be greater than equal to 1) #482

Open danao413 opened 1 month ago

danao413 commented 1 month ago

09/16/2024 09:02:53 - critical - poptorch::python - ValueError: IPUConfig attribute replication_factor must be >= 1. You provided value=0

Traceback (most recent call last): File "run_translation.py", line 671, in main() File "run_translation.py", line 568, in main trainer = IPUSeq2SeqTrainer( File "/localdata/u.do100367/poptorch/lib/python3.8/site-packages/optimum/graphcore/trainer_seq2seq.py", line 64, in init super().init( File "/localdata/u.do100367/poptorch/lib/python3.8/site-packages/optimum/graphcore/trainer.py", line 282, in init self.ipu_config.replication_factor = n_ipu // self.ipu_config.ipus_per_replica File "/localdata/u.do100367/poptorch/lib/python3.8/site-packages/optimum/graphcore/ipu_configuration.py", line 485, in setattr vfunc(name, value) File "/localdata/u.do100367/poptorch/lib/python3.8/site-packages/optimum/graphcore/ipu_configuration.py", line 238, in _contents_geq_value_validator raise ValueError(f"IPUConfig attribute {name} must be >= {floor_value}. You provided {value=}") ValueError: IPUConfig attribute replication_factor must be >= 1. You provided value=0

I have changed this multiple times in the IPUConfig instantiation to be 1, 100, 10 and each time I get this error.