huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
131.91k stars 26.27k forks source link

[nougat] Unable to use nougat models with `image-to-text` pipeline #27475

Closed xenova closed 7 months ago

xenova commented 10 months ago

System Info

Who can help?

@NielsRogge @Narsil

Information

Tasks

Reproduction

Running

from transformers import pipeline
pipe = pipeline('image-to-text', 'facebook/nougat-base')
pipe('https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/nougat_paper.png')

results in the following error:

ValueError: Unrecognized feature extractor in facebook/nougat-base. Should have a `feature_extractor_type` key in its preprocessor_config.json of config.json, or one of the following `model_type` keys in its config.json: audio-spectrogram-transformer, beit, chinese_clip, clap, clip, clipseg, clvp, conditional_detr, convnext, cvt, data2vec-audio, data2vec-vision, deformable_detr, deit, detr, dinat, donut-swin, dpt, encodec, flava, glpn, groupvit, hubert, imagegpt, layoutlmv2, layoutlmv3, levit, maskformer, mctct, mobilenet_v1, mobilenet_v2, mobilevit, nat, owlvit, perceiver, poolformer, pop2piano, regnet, resnet, seamless_m4t, segformer, sew, sew-d, speech_to_text, speecht5, swiftformer, swin, swinv2, table-transformer, timesformer, tvlt, unispeech, unispeech-sat, van, videomae, vilt, vit, vit_mae, vit_msn, wav2vec2, wav2vec2-conformer, wavlm, whisper, xclip, yolos

Expected behavior

The model should function properly with the pipeline API.

Pratyush-exe commented 10 months ago

Hi @xenova,

This appears to be a config issue from their side. Here is a quick fix.

from transformers import pipeline
from transformers import AutoFeatureExtractor

pipe = pipeline(
    task='image-to-text', 
    model='facebook/nougat-base', 
    feature_extractor=AutoFeatureExtractor,
)

response = pipe(
    'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/nougat_paper.png', 
    max_new_tokens=20
)
print(response[0].get('generated_text'))

Please let me know if it works for you :)

github-actions[bot] commented 9 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

avisinghal6 commented 9 months ago

Hi, I would like to work on this issue. Could you please assign it to me?

amyeroberts commented 8 months ago

Hi @avisinghal6 - we don't normally assign issues: people saying they're working on it and open a PR in github or on the hub directly, linking in a comment to the related work.

In this case - you're more than welcome to tackle this!

avisinghal6 commented 8 months ago

Thanks, i will work on this issue and update the status in a few days.

coolyashas commented 8 months ago
feature_extractor

I have a few questions regarding this issue:

  1. Is it possible to solve the config issue and if yes, any leads on how?
  2. Does it involve changing the files here? If yes, which files must be edited?

Any help is highly appreciated!

NielsRogge commented 8 months ago

Hi,

I looked a bit into this issue and found the problematic line. Opened a PR above.