[ ] my own modified scripts:
` self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(self.device)
# Initialize processor and model
self.processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-handwritten")
self.model = VisionEncoderDecoderModel.from_pretrained("microsoft/trocr-base-handwritten").to(self.device)
# Prepare image and move pixel values to the device
image = Image.open(image_path).convert("RGB")
pixel_values = self.processor(image, return_tensors="pt").pixel_values.to(self.device)
# Generate text
generated_ids = self.model.generate(pixel_values)
generated_text = self.processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
`
A clear and concise description of what the bug is:
when running microsoft/trocr-base-handwritten against the IAM word dataset ( single words ), I got a CER of about 30%
when running it against the IAM line dataset, the CER is about 4%
is this expected?
can I train the model on single word images to enhance its performance on single words to 4% CER? or is it inherintally bad on single words?
is the model being trained on full lines instead of single words, the reason for the 30% CER?
To Reproduce
Steps to reproduce the behavior:
use the sample code with microsoft/trocr-base-handwritten against the IAM word dataset, the CER will be aroud 30%
Describe the bug Model I am using: trocr-base-handwritten
Dataset:
The problem arises when using:
[ ] my own modified scripts: ` self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu") print(self.device)
`
A clear and concise description of what the bug is: when running microsoft/trocr-base-handwritten against the IAM word dataset ( single words ), I got a CER of about 30% when running it against the IAM line dataset, the CER is about 4%
To Reproduce Steps to reproduce the behavior: