huggingface / optimum-neuron

Easy, fast and very cheap training and inference on AWS Trainium and Inferentia chips.
Apache License 2.0
206 stars 60 forks source link

Error during evaluation step when fine-tuning bert-base-uncased #174

Closed samir-souza closed 1 year ago

samir-souza commented 1 year ago

tested versions: optimum-neuron versions 0.0.8 and 0.0.9

I'm training a bert-base-uncased as a binary text classifier with spam/not spam dataset (Deysi/spam-detection-dataset). The training works perfectly on an trn1.2xlarge, however the evaluation step raises an exception which complains about the line:
input_names = inspect.signature(model.forward).parameters.keys() ValueError: invalid method signature

Notebooks used to reproduce the error:

01 data prep 02 model training

aws-neuronx-runtime-discovery 2.9 libneuronxla 0.5.391 neuron-cc 1.17.0.0+1810fd7ed neuronx-cc 2.8.0.25+a3ad0f342 neuronx-distributed 0.2.0 neuronx-hwm 2.8.0.3+2b7c6da39 torch-neuronx 1.13.1.1.9.0 torch-xla 1.13.1+torchneuron8 transformers-neuronx 0.5.58

Traceback (most recent call last):
  File "/opt/ml/code/train.py", line 127, in <module>
    trainer.train()
  File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1645, in train
    return inner_training_loop(
  File "/usr/local/lib/python3.10/site-packages/optimum/neuron/utils/patching.py", line 180, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/optimum/neuron/trainers.py", line 457, in _inner_training_loop
    return super()._inner_training_loop(
  File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 2026, in _inner_training_loop
    self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
  File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 2312, in _maybe_log_save_evaluate
    metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
  File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 3043, in evaluate
    output = eval_loop(
  File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 3235, in evaluation_loop
    loss, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys)
  File "/usr/local/lib/python3.10/site-packages/optimum/neuron/trainers.py", line 289, in prediction_step
    self.trigger_on_step_middle_for_neuron_cache_callback(model)
  File "/usr/local/lib/python3.10/site-packages/optimum/neuron/trainers.py", line 243, in trigger_on_step_middle_for_neuron_cache_callback
    callback.on_step_middle(self.args, self.state, self.control, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/optimum/neuron/trainer_callback.py", line 288, in on_step_middle
    self.neuron_hash_for_model(args, model, state.last_inputs, try_to_fetch_cached_model=True)
  File "/usr/local/lib/python3.10/site-packages/optimum/neuron/trainer_callback.py", line 198, in neuron_hash_for_model
    input_names = inspect.signature(model.forward).parameters.keys()
  File "/usr/local/lib/python3.10/inspect.py", line 3254, in signature
    return Signature.from_callable(obj, follow_wrapped=follow_wrapped,
  File "/usr/local/lib/python3.10/inspect.py", line 3002, in from_callable
    return _signature_from_callable(obj, sigcls=cls,
  File "/usr/local/lib/python3.10/inspect.py", line 2404, in _signature_from_callable
    return _signature_bound_method(sig)
  File "/usr/local/lib/python3.10/inspect.py", line 1967, in _signature_bound_method
    raise ValueError('invalid method signature')
ValueError: invalid method signature
michaelbenayoun commented 1 year ago

Hi @samir-souza, Are you able to reproduce this from the main branch?

Also could you provide a command line to reproduce it please? I dont observe it myself when running a text-classification example.

samir-souza commented 1 year ago

@michaelbenayoun Just tested with main and it is working well now. I'll close this ticket, given this issue is fixed. (FYI, I used the linked notebooks to run the experiments)