Closed blzheng closed 7 months ago
Hi @blzheng, thanks for raising this issue!
What performance do you get, running on main with different seeds?
Hi @amyeroberts , we observed accuracy drop from 0.8131 (transformers==4.18.0) to 0.8033 (transformers==4.28.1), then I narrowed down to this commit with git bisect. This issue can be reproduced stably by running the following command. Even with the latest codebase, this issue still exists. "python transformers/examples/pytorch/image-classification/run_image_classification.py --model_name_or_path google/vit-base-patch16-224 --do_eval --dataset_name imagenet-1k --per_device_eval_batch_size 1 --remove_unused_columns False --output_dir ./"
@blzheng Thanks for confirming.
The reason for the change is because the processing logic in the image classification script was updated to reflect that of the model's image processor.
Previously, size
could be an int, and was passed directly to torchvision.transforms.Resize
. If size
is an int (which it is for many model's e.g. here for a vit checkpoint), then the shortest edge of the image is resized to size
and the other edge rescaled to keep the image's aspect ratio.
However, in the now-deprecated feature extractors (superceeded in #19796), the default behaviour if size
was an int, was to resize the image to (size, size)
. This was the case of ViT.
The script now reflects the behaviour of the image processor, even when using torchvision transforms.
@amyeroberts Thanks for your information. Now that the changes in image processing logic are reasonable, does it mean the accuracy drop is expected?
@blzheng It depends what you mean by "expected". The change in the logic means that the aspect ratio of the input images is different, and so one would expect there to be a performance difference. Even though it's not in-line with the processing of the model's image processors, the previous processing might bring better performance because it preserves the true aspect of images and hence shape/dimensions of the subjects in the image (this is speculation).
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
transformers
version: 4.25.0.dev0Who can help?
@amyeroberts
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Accuracy regression caused by https://github.com/huggingface/transformers/pull/19796
Reproduce command: python transformers/examples/pytorch/image-classification/run_image_classification.py --model_name_or_path google/vit-base-patch16-224 --do_eval --dataset_name imagenet-1k --per_device_eval_batch_size 1 --remove_unused_columns False --output_dir ./
Expected behavior
Expected results:
eval_accuracy = 0.8131 eval_loss = 0.7107 eval_runtime = 0:43:40.30 eval_samples_per_second = 19.082 eval_steps_per_second = 19.082 Current results: eval_accuracy = 0.8033 eval_loss = 0.755 eval_runtime = 0:34:05.81 eval_samples_per_second = 24.44 eval_steps_per_second = 0.436