a small bug with trocr example and large model

fsa3z commented 9 months ago

Hi,

Problem : The example works well with the base model, but not with the large model.

cargo run --example trocr --release --  --which large --cpu --image candle-examples/examples/trocr/assets/trocr.png
model: "/huggingface/hub/models--microsoft--trocr-large-handwritten/snapshots/f07eb3a73a9b06a73141dba2ae1f1671c5c346af/model.safetensors"
Error: shape mismatch for encoder.embeddings.cls_token, expected: [1, 1, 768], got: [1, 1, 1024]

the trouble came from :

let encoder_config = match args.which {
        Which::Base => candle_transformers::models::vit::Config::microsoft_trocr_base_handwritten(),
        Which::Large => {
            candle_transformers::models::vit::Config::microsoft_trocr_base_handwritten()
        }
    };

Which::Large is build with the same config as Which::Base

Solution : Building encoder_config and decoder_config using https://huggingface.co/microsoft/trocr-large-handwritten/blob/main/config.json solve the problem.

katopz commented 9 months ago

Thanks for head up, already made a PR with printed supported.

Working

cargo run --example trocr --release --  --which base-hand-written --cpu --image candle-examples/examples/trocr/assets/trocr.png
cargo run --example trocr --release --  --which large-hand-written --cpu --image candle-examples/examples/trocr/assets/trocr.png
cargo run --example trocr --release --  --which base-printed --cpu --image candle-examples/examples/trocr/assets/printed-number.jpg

Remain bug

cargo run --example trocr --release --  --which large-printed --cpu --image candle-examples/examples/trocr/assets/printed-number.jpg

got

Error: cannot find tensor decoder.model.decoder.embed_positions.weight

Any idea on this one?

LaurentMazare commented 9 months ago

I've just merged #1689 which instead of using an hardcoded config gets it from the HF hub. This should make it easier to add more supported models in the future if compatible architectures appear. For the large-printed model, the trickiness is that the position embeddings are not learnt but rather hardcoded in the model. I've made the error message be more specific about it and will look at adding support for this.

LaurentMazare commented 9 months ago

Closing this now as hopefully it's all good, feel free to re-open if you run into further issues.

huggingface / candle

a small bug with trocr example and large model #1645

Working

Remain bug