openvinotoolkit / openvino.genai

Run Generative AI models using native OpenVINO C++ API
Apache License 2.0
105 stars 141 forks source link

Convert.py breaks for microsoft/trocr-base-printed #363

Closed NoushNabi closed 4 months ago

NoushNabi commented 4 months ago

Context

Running convert.py for microsoft/trocr-base-printed, it gives the below error:

ValueError: Unrecognized configuration class <class 'transformers.models.vision_encoder_decoder.configuration_vision_encoder_decoder.VisionEncoderDecoderConfig'> for this kind of AutoModel: AutoModelForCausalLM. Model type should be one of BartConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BlenderbotConfig, BlenderbotSmallConfig, BloomConfig, CamembertConfig, LlamaConfig, CodeGenConfig, CohereConfig, CpmAntConfig, CTRLConfig, Data2VecTextConfig, ElectraConfig, ErnieConfig, FalconConfig, FuyuConfig, GemmaConfig, GitConfig, GPT2Config, GPT2Config, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GPTJConfig, LlamaConfig, MambaConfig, MarianConfig, MBartConfig, MegaConfig, MegatronBertConfig, MistralConfig, MixtralConfig, MptConfig, MusicgenConfig, MusicgenMelodyConfig, MvpConfig, OpenLlamaConfig, OpenAIGPTConfig, OPTConfig, PegasusConfig, PersimmonConfig, PhiConfig, PLBartConfig, ProphetNetConfig, QDQBertConfig, Qwen2Config, ReformerConfig, RemBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, RwkvConfig, Speech2Text2Config, StableLmConfig, Starcoder2Config, TransfoXLConfig, TrOCRConfig, WhisperConfig, XGLMConfig, XLMConfig, XLMProphetNetConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, XmodConfig.

As you can see in the error message, it says Model type should be one of ... and "TrOCRConfig" is part of the list. So "trocr-base-printed" should be supported, but the conversion fails.

What needs to be done?

N/A

Example Pull Requests

No response

Resources

Contact points

N/A

Ticket

No response

stevn09 commented 4 months ago

WLB# .take

The error message you're encountering indicates that convert.py is attempting to convert the microsoft/trocr-base-printed model into an AutoModelForCausalLM, which is typically used for language modeling. However, TrOCR is a model designed for text recognition (OCR), and thus, should not be converted to a causal language model.

Instead, you should use the appropriate class or function to load the TrOCR model. The transformers library provides classes specifically for loading models of certain types. For TrOCR, you should utilize the AutoModelForVisionAndLanguage or VisionEncoderDecoderModel class.

Here's an example of how you might do this: from transformers import VisionEncoderDecoderModel, VisionEncoderDecoderConfig

Load the configuration for the TrOCR model

config = VisionEncoderDecoderConfig.from_pretrained("microsoft/trocr-base-printed")

Load the TrOCR model using the configuration

model = VisionEncoderDecoderModel.from_pretrained("microsoft/trocr-base-printed", config=config)

Make sure you're using the latest version of the transformers library, as the API may change with each update. If convert.py is a script provided by a third party, you may need to check its documentation or source code to see if it supports the TrOCR model or if modifications are required to support this type of model.

If you're trying to use convert.py to convert the model format (e.g., from PyTorch to TensorFlow), you may need to find a conversion tool that specifically supports TrOCR or perform the conversion manually. In some cases, conversion may not be directly supported, especially for non-standard model architectures. In such instances, you may need to contact the model maintainers or search for community-provided solutions.

github-actions[bot] commented 4 months ago

Thank you for looking into this issue! Please let us know if you have any questions or require any help.

eaidova commented 4 months ago

@NoushNabi llm_bench is not targeted to support trOCR, please use optimum-intel interface and optimum-cli tool for model export and inference