run_clm.py AttributeError: 'NoneType' object has no attribute 'get'

pawanchk commented 2 weeks ago

Hi,

I am running run_clm.py for ProtGPT2 (https://huggingface.co/nferruz/ProtGPT2) model - training completes successfully but evaluation ends in this error - AttributeError: 'NoneType' object has no attribute 'get'

This is the command I am running on a MacBook M3 Max (OS - Sonoma 14.5) - python3 run_clm.py --model_name_or_path nferruz/ProtGPT2 --train_file training.txt --validation_file validation.txt --tokenizer_name nferruz/ProtGPT2 --do_train --do_eval --output_dir output --learning_rate 1e-06 --use_cpu True

Can I please know how I can resolve this issue ?

amyeroberts commented 2 weeks ago

Hi @pawanchk, thanks for raising an issue!

So that we can help you, could you provide the full error traceback and the running environment: run transformers-cli env in the terminal and copy-paste the output?

pawanchk commented 2 weeks ago

Hi @amyeroberts

Thanks so much for your response.

These are the last few lines of the log containing the error traceback -

06/19/2024 15:26:50 - INFO - __main__ - *** Evaluate ***
[INFO|trainer.py:3783] 2024-06-19 15:26:50,645 >> 
***** Running Evaluation *****
[INFO|trainer.py:3785] 2024-06-19 15:26:50,646 >>   Num examples = 0
[INFO|trainer.py:3788] 2024-06-19 15:26:50,646 >>   Batch size = 8
Traceback (most recent call last):
  File "~/ProtGPT2/run_clm.py", line 657, in <module>
    main()
  File "~/ProtGPT2/run_clm.py", line 623, in main
    metrics = trainer.evaluate()
  File "~/ProtGPT2/.protgpt2venv/lib/python3.9/site-packages/transformers/trainer.py", line 3636, in evaluate
    output = eval_loop(
  File "~/ProtGPT2/.protgpt2venv/lib/python3.9/site-packages/transformers/trainer.py", line 3821, in evaluation_loop
    losses, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys)
  File "~/ProtGPT2/.protgpt2venv/lib/python3.9/site-packages/transformers/trainer.py", line 3988, in prediction_step
    has_labels = False if len(self.label_names) == 0 else all(inputs.get(k) is not None for k in self.label_names)
  File "~/ProtGPT2/.protgpt2venv/lib/python3.9/site-packages/transformers/trainer.py", line 3988, in <genexpr>
    has_labels = False if len(self.label_names) == 0 else all(inputs.get(k) is not None for k in self.label_names)
AttributeError: 'NoneType' object has no attribute 'get'

Also, this is the running environment details for transformers-cli env

- `transformers` version: 4.41.2
- Platform: macOS-14.5-arm64-arm-64bit
- Python version: 3.9.6
- Huggingface_hub version: 0.23.4
- Safetensors version: 0.4.3
- Accelerate version: 0.31.0
- Accelerate config:    not found
- PyTorch version (GPU?): 2.3.1 (False)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: no
- Using distributed or parallel set-up in script?: no

Any suggestions regarding how to resolve the issue will be very helpful.

mishra011 commented 2 weeks ago

Facing same issue.

***** Running Evaluation *****
[INFO|trainer.py:3791] 2024-06-20 11:46:49,692 >>   Num examples = 0
[INFO|trainer.py:3794] 2024-06-20 11:46:49,692 >>   Batch size = 8
Traceback (most recent call last):
  File "/Users/deepakmishra/work/llm-scratch/huggingface-llm/run_clm.py", line 657, in <module>
    main()
  File "/Users/deepakmishra/work/llm-scratch/huggingface-llm/run_clm.py", line 623, in main
    metrics = trainer.evaluate()
  File "/Users/deepakmishra/.pyenv/versions/denv/lib/python3.10/site-packages/transformers/trainer.py", line 3642, in evaluate
    output = eval_loop(
  File "/Users/deepakmishra/.pyenv/versions/denv/lib/python3.10/site-packages/transformers/trainer.py", line 3827, in evaluation_loop
    losses, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys)
  File "/Users/deepakmishra/.pyenv/versions/denv/lib/python3.10/site-packages/transformers/trainer.py", line 3994, in prediction_step
    has_labels = False if len(self.label_names) == 0 else all(inputs.get(k) is not None for k in self.label_names)
  File "/Users/deepakmishra/.pyenv/versions/denv/lib/python3.10/site-packages/transformers/trainer.py", line 3994, in <genexpr>
    has_labels = False if len(self.label_names) == 0 else all(inputs.get(k) is not None for k in self.label_names)
AttributeError: 'NoneType' object has no attribute 'get'

mishra011 commented 2 weeks ago

I found the issue, the number of samples is zero means no data for validation and my validation data file was empty. To fix this I fixed the validation data file and it is working now.

pawanchk commented 2 weeks ago

@mishra011 Thanks for the update

My validation.txt file is not empty though - I have 4 protein sequences in my validation file - is that not enough, are more sequences needed for validation ?

Also, is it related to batch size - since batch size is 8 - do I need at least 8 sequences in the validation.txt file ?

mishra011 commented 2 weeks ago

Yes @pawanchk it needs more data.

pawanchk commented 2 weeks ago

Thanks so much @mishra011

Increasing the number of sequences in the validation.txt helped to resolve the AttributeError issue.

The evaluation ended like this -

[INFO|modelcard.py:449] 2024-06-20 17:08:03,918 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}, 'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.20625610948191594}]}

huggingface / transformers

run_clm.py AttributeError: 'NoneType' object has no attribute 'get' #31487