Closed VallabhMahajan1 closed 1 year ago
cc @younesbelkada and @amyeroberts
@VallabhMahajan1 Could you share a reproducible code snippet and information about the running environment (run transformers-cli env
in the terminal and copy-paste the output)?
From the traceback, it seems the issue is coming in the metric calculation when using Trainer
.
I'm able to build and run a small example with the checkpoints you shared on the main
branch:
from transformers import AutoImageProcessor, AutoTokenizer, VisionEncoderDecoderModel
import requests
from PIL import Image
import torch
encoder_checkpoint = "google/vit-base-patch16-224"
decoder_checkpoint = "bert-base-multilingual-cased"
image_processor = AutoImageProcessor.from_pretrained(encoder_checkpoint)
tokenizer = AutoTokenizer.from_pretrained(decoder_checkpoint)
model = VisionEncoderDecoderModel.from_encoder_decoder_pretrained(
encoder_pretrained_model_name_or_path=encoder_checkpoint,
decoder_pretrained_model_name_or_path=decoder_checkpoint,
)
# load image from the IAM dataset
url = "https://fki.tic.heia-fr.ch/static/img/a01-122-02.jpg"
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
# training
model.config.decoder_start_token_id = tokenizer.cls_token_id
model.config.pad_token_id = tokenizer.pad_token_id
model.config.vocab_size = model.config.decoder.vocab_size
pixel_values = image_processor(image, return_tensors="pt").pixel_values
text = "hello world"
labels = tokenizer(text, return_tensors="pt").input_ids
outputs = model(pixel_values=pixel_values, labels=labels)
loss = outputs.loss
Thanks for the reply. I was trying to train trocr model. Below is the code snippet. I'm not sure but I guess we are got this error in compute matrix function.
- `transformers` version: 4.28.0
- Platform: Linux-5.15.107+-x86_64-with-glibc2.31
- Python version: 3.10.11
- Huggingface_hub version: 0.14.1
- Safetensors version: not installed
- PyTorch version (GPU?): 2.0.1+cu118 (True)
- Tensorflow version (GPU?): 2.12.0 (True)
- Flax version (CPU?/GPU?/TPU?): 0.6.9 (gpu)
- Jax version: 0.4.10
- JaxLib version: 0.4.10
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?: no
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained("bert-base-multilingual-cased")
feature_extractor=ViTFeatureExtractor.from_pretrained("google/vit-base-patch16-224-in21k")
processor = TrOCRProcessor(feature_extractor = feature_extractor, tokenizer = tokenizer)
model = VisionEncoderDecoderModel.from_encoder_decoder_pretrained("google/vit-base-patch16-224", "bert-base-multilingual-cased")
cer_metric = load_metric("cer")
def compute_metrics(pred):
labels_ids = pred.label_ids
pred_ids = pred.predictions
pred_str = processor.batch_decode(pred_ids, skip_special_tokens=True)
labels_ids[labels_ids == -100] = processor.tokenizer.pad_token_id
label_str = processor.batch_decode(labels_ids, skip_special_tokens=True)
cer = cer_metric.compute(predictions=pred_str, references=label_str)
return {"cer": cer}
from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments
training_args = Seq2SeqTrainingArguments(
predict_with_generate=True,
evaluation_strategy="steps",
num_train_epochs=1,
per_device_train_batch_size=16,
per_device_eval_batch_size=16,
fp16=True,
output_dir="./",
logging_steps=2,
save_strategy="no",
eval_steps=100,
)
from transformers import default_data_collator
# instantiate trainer
trainer = Seq2SeqTrainer(
model=model,
tokenizer=processor.tokenizer,
args=training_args,
compute_metrics=compute_metrics,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
data_collator=default_data_collator,
)
trainer.train()
cc @younesbelkada and @amyeroberts
@VallabhMahajan1 Thank you for providing a code snippet.
However, the code snippet is incomplete: train_dataset
and eval_dataset
are not defined.
If you can't provide these datasets, you can try to use public datasets (for example, on HF's dataset Hub) which is similar to your own datasets. In any case, please use a small dataset (or take a small slice from the large dataset).
Without a self-complete code snippet to reproduce, we are not able to help. Thank you.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
@VallabhMahajan1 I got the same error but different language model. I used kkatiz/thai-trocr-thaigov-v2
and found out that the model does not support english uppercase character which is why some of the ground truth becomes "" (i.e. empty string)
@takipipo Hi, I got the same error with trocr model, did you solve this problem?
I was trying to train a VisionEncoderDecoderModel and I got the below error. For decoder I'm using bert-base-multilingual-cased and encoder is google/vit-base-patch16-224. How to solve this error? Thanks in advace!!