Kosmos-2.5 - Image-to-markdown generation for images outside the sample-set provided is almost entirely garbled - output markdown is completely unusable. #1602
[ x ] the official example scripts: Using the exact required custom-libraries and dependencies to run the supplied inference.py script. Same results obtained when run bare-metal or when extended via a simple Flask-API in a containerized environment: https://github.com/abgulati/kosmos-2_5-containerized
Description: Image-to-markdown generation for images outside the sample-set provided is almost entirely garbled - output markdown is completely unusable.
Elaborating in the examples below:
Example 1 - Using the sample in.png example image provided with the model:
On running the inference.py script with --do_md for image-to-markdown generation:
Isolating the results:
Cleaning the results:
Perfect markdown output as rendered via https://markdownlivepreview.com/:
This confirms the model is working correctly!
Example 2 - Table from a Boeing manual:
Output of inference.py script with --do_md for image-to-markdown generation:
Copying, cleaning and generating a markdown preview of the results - completely garbled & unusable output:
Example 3 - Table of network connectors from my notes for the CompTIA Network+ exam:
Output of inference.py script with --do_md for image-to-markdown generation:
Copying, cleaning and generating a markdown preview of the results - completely garbled & unusable output:
Example 4 - Table of commons ports and services from my notes for the CompTIA Network+ & Security+ exams:
Output of inference.py script with --do_md for image-to-markdown generation:
Copying, cleaning and generating a markdown preview of the results - completely garbled & unusable output:
As demonstrated by these examples, markdown-generation for images outside the sample (training?) set is completely garbled and unusable. The first example establishes the model itself is working correctly.
Further, --do_ocr works perfectly and outputs high-accuracy, high-quality data.
To Reproduce
Steps to reproduce the behavior:
Run model for markdown generation: python3 inference.py --do_md --image_path/image.png -- ckpt ckpt.pt
Expected behavior
Respectably accurate markdown generation
Platform: WSL Ubuntu 22.04
Python version: v3.10.12
PyTorch version (GPU?): 2.5.0.dev20240705+cu124 for RTX 3090
Describe the bug Model I am using: Kosmos-2.5
The problem arises when using:
inference.py
script. Same results obtained when run bare-metal or when extended via a simple Flask-API in a containerized environment: https://github.com/abgulati/kosmos-2_5-containerizedDescription: Image-to-markdown generation for images outside the sample-set provided is almost entirely garbled - output markdown is completely unusable.
Elaborating in the examples below:
Example 1 - Using the sample
in.png
example image provided with the model:inference.py
script with--do_md
for image-to-markdown generation:results
:results
:https://markdownlivepreview.com/
:This confirms the model is working correctly!
Example 2 - Table from a Boeing manual:
inference.py
script with--do_md
for image-to-markdown generation:results
- completely garbled & unusable output:Example 3 - Table of network connectors from my notes for the CompTIA Network+ exam:
inference.py
script with--do_md
for image-to-markdown generation:results
- completely garbled & unusable output:Example 4 - Table of commons ports and services from my notes for the CompTIA Network+ & Security+ exams:
inference.py
script with--do_md
for image-to-markdown generation:results
- completely garbled & unusable output:As demonstrated by these examples, markdown-generation for images outside the sample (training?) set is completely garbled and unusable. The first example establishes the model itself is working correctly.
Further,
--do_ocr
works perfectly and outputs high-accuracy, high-quality data.To Reproduce Steps to reproduce the behavior:
python3 inference.py --do_md --image_path/image.png -- ckpt ckpt.pt
Expected behavior Respectably accurate markdown generation