trOCR base model 30% CER on IAM word dataset vs 4% for IAM line dataset, is this normal?

Describe the bug Model I am using: trocr-base-handwritten

Dataset:

IAM word dataset: https://www.kaggle.com/datasets/nibinv23/iam-handwriting-word-database
IAM line dataset: https://huggingface.co/datasets/Teklia/IAM-line

The problem arises when using:

[ ] my own modified scripts: ` self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu") print(self.device)

# Initialize processor and model
self.processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-handwritten")
self.model = VisionEncoderDecoderModel.from_pretrained("microsoft/trocr-base-handwritten").to(self.device)
# Prepare image and move pixel values to the device
image = Image.open(image_path).convert("RGB")
pixel_values = self.processor(image, return_tensors="pt").pixel_values.to(self.device)

# Generate text
generated_ids = self.model.generate(pixel_values)
generated_text = self.processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

A clear and concise description of what the bug is: when running microsoft/trocr-base-handwritten against the IAM word dataset ( single words ), I got a CER of about 30% when running it against the IAM line dataset, the CER is about 4%

is this expected?
can I train the model on single word images to enhance its performance on single words to 4% CER? or is it inherintally bad on single words?
is the model being trained on full lines instead of single words, the reason for the 30% CER?

To Reproduce Steps to reproduce the behavior:

use the sample code with microsoft/trocr-base-handwritten against the IAM word dataset, the CER will be aroud 30%

Platform: windows 10
Python version: 3.8
PyTorch version (GPU?): 2.5.1+cu124

microsoft / unilm

trOCR base model 30% CER on IAM word dataset vs 4% for IAM line dataset, is this normal? #1653