[nougat] Unable to use nougat models with `image-to-text` pipeline

xenova commented 10 months ago

System Info

transformers version: 4.36.0.dev0
Platform: Linux-5.15.120+-x86_64-with-glibc2.35
Python version: 3.10.12
Huggingface_hub version: 0.17.3
Safetensors version: 0.4.0
Accelerate version: not installed
Accelerate config: not found
PyTorch version (GPU?): 2.1.0+cu118 (False)
Tensorflow version (GPU?): 2.14.0 (False)
Flax version (CPU?/GPU?/TPU?): 0.7.5 (cpu)
Jax version: 0.4.20
JaxLib version: 0.4.20
Using GPU in script?: no
Using distributed or parallel set-up in script?: no

Who can help?

@NielsRogge @Narsil

Information

[ ] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

Running

from transformers import pipeline
pipe = pipeline('image-to-text', 'facebook/nougat-base')
pipe('https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/nougat_paper.png')

results in the following error:

ValueError: Unrecognized feature extractor in facebook/nougat-base. Should have a `feature_extractor_type` key in its preprocessor_config.json of config.json, or one of the following `model_type` keys in its config.json: audio-spectrogram-transformer, beit, chinese_clip, clap, clip, clipseg, clvp, conditional_detr, convnext, cvt, data2vec-audio, data2vec-vision, deformable_detr, deit, detr, dinat, donut-swin, dpt, encodec, flava, glpn, groupvit, hubert, imagegpt, layoutlmv2, layoutlmv3, levit, maskformer, mctct, mobilenet_v1, mobilenet_v2, mobilevit, nat, owlvit, perceiver, poolformer, pop2piano, regnet, resnet, seamless_m4t, segformer, sew, sew-d, speech_to_text, speecht5, swiftformer, swin, swinv2, table-transformer, timesformer, tvlt, unispeech, unispeech-sat, van, videomae, vilt, vit, vit_mae, vit_msn, wav2vec2, wav2vec2-conformer, wavlm, whisper, xclip, yolos

Expected behavior

The model should function properly with the pipeline API.

Pratyush-exe commented 10 months ago

Hi @xenova,

This appears to be a config issue from their side. Here is a quick fix.

from transformers import pipeline
from transformers import AutoFeatureExtractor

pipe = pipeline(
    task='image-to-text', 
    model='facebook/nougat-base', 
    feature_extractor=AutoFeatureExtractor,
)

response = pipe(
    'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/nougat_paper.png', 
    max_new_tokens=20
)
print(response[0].get('generated_text'))

Please let me know if it works for you :)

github-actions[bot] commented 9 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

avisinghal6 commented 9 months ago

Hi, I would like to work on this issue. Could you please assign it to me?

amyeroberts commented 8 months ago

Hi @avisinghal6 - we don't normally assign issues: people saying they're working on it and open a PR in github or on the hub directly, linking in a comment to the related work.

In this case - you're more than welcome to tackle this!

avisinghal6 commented 8 months ago

Thanks, i will work on this issue and update the status in a few days.

coolyashas commented 8 months ago

feature_extractor

I have a few questions regarding this issue:

Is it possible to solve the config issue and if yes, any leads on how?
Does it involve changing the files here? If yes, which files must be edited?

Any help is highly appreciated!

NielsRogge commented 8 months ago

Hi,

I looked a bit into this issue and found the problematic line. Opened a PR above.

huggingface / transformers