huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.47k stars 26.39k forks source link

[Nougat] image after NougatProcessor leads to Traceback #33341

Open ehuaa opened 2 weeks ago

ehuaa commented 2 weeks ago

System Info

Who can help?

@amyeroberts

Information

Tasks

Reproduction

just one line python convert_nougat_to_hf.py

When i use convert_nougat_to_hf script to convert the model of facebook/nougat 0.1.0-base to huggingface version, it fails at https://github.com/huggingface/transformers/blob/main/src/transformers/models/nougat/convert_nougat_to_hf.py#L180 and the traceback shows as below:

Traceback (most recent call last): File "/data/czh/nougat/convert_nougat_to_hf.py", line 285, in convert_nougat_checkpoint(args.model_tag, args.pytorch_dump_folder_path, args.push_to_hub) File "/data/czh/nougat/convert_nougat_to_hf.py", line 180, in convert_nougat_checkpoint pixel_values = processor(image, return_tensors="pt").pixel_values File "/usr/local/lib/python3.10/dist-packages/transformers/models/nougat/processing_nougat.py", line 92, in call inputs = self.image_processor( File "/usr/local/lib/python3.10/dist-packages/transformers/image_processing_utils.py", line 551, in call return self.preprocess(images, **kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/nougat/image_processing_nougat.py", line 489, in preprocess images = [self.thumbnail(image=image, size=size, input_data_format=input_data_format) for image in images] File "/usr/local/lib/python3.10/dist-packages/transformers/models/nougat/image_processing_nougat.py", line 489, in images = [self.thumbnail(image=image, size=size, input_data_format=input_data_format) for image in images] File "/usr/local/lib/python3.10/dist-packages/transformers/models/nougat/image_processing_nougat.py", line 309, in thumbnail return resize( File "/usr/local/lib/python3.10/dist-packages/transformers/image_transforms.py", line 330, in resize resized_image = image.resize((width, height), resample=resample, reducing_gap=reducing_gap) File "/usr/local/lib/python3.10/dist-packages/PIL/Image.py", line 2206, in resize factor_y = int((box[3] - box[1]) / size[1] / reducing_gap) or 1 ZeroDivisionError: division by zero

which means that the Transformer's version of NougatProcessor failes at processing the png below, while the original version of facebook nougat repo's prepare_input function can handle this png without error.

1711 02512_fig_exp_place_general_table_1

Expected behavior

The png above can be processed without error with Transformer NougatProcessor. image the transformer's version of nougatprocessor can output the same as original_pixel_values. image

amyeroberts commented 2 weeks ago

Hi @ehuaa, thanks for raising this issue!

I'm not able to reproduce the error, either on 4.38.2 or on the development branch of transformers when running python convert_nougat_to_hf.py.

I do hit an assertion error as the generated output is slightly different than what's being asserted on this line, however the difference is due to different newlines and spaces, so I don't think it's too much of a concern.

Which version of the nougat library and PIL do you have installed?

Out of interest, what's the reason for running the conversion script for 0.1.0-base? The checkpoint is already available here: https://huggingface.co/facebook/nougat-base