microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
20.16k stars 2.55k forks source link

Kosmos-2.5 - Image-to-markdown generation for images outside the sample-set provided is almost entirely garbled - output markdown is completely unusable. #1602

Open abgulati opened 3 months ago

abgulati commented 3 months ago

Describe the bug Model I am using: Kosmos-2.5

The problem arises when using:

Description: Image-to-markdown generation for images outside the sample-set provided is almost entirely garbled - output markdown is completely unusable.

Elaborating in the examples below:

Example 1 - Using the sample in.png example image provided with the model:

Sample-1-in

1-in-png-response

2-in-png-response-isolated

3-in-png-response-cleaned

4-in-png-markdown-preview

This confirms the model is working correctly!

Example 2 - Table from a Boeing manual:

Sample-2-Boeing

5-Boeing-Backgrounder-Response

6-Boeing-Backgrounder-Markdown-Preview

Example 3 - Table of network connectors from my notes for the CompTIA Network+ exam:

Sample-3-Connectors

7-Connectors-Reponse

8-Connectors-Markdown-Preview

Example 4 - Table of commons ports and services from my notes for the CompTIA Network+ & Security+ exams:

Sample-4-Ports

9-Ports-Response

10-Ports-Markdown-Response

As demonstrated by these examples, markdown-generation for images outside the sample (training?) set is completely garbled and unusable. The first example establishes the model itself is working correctly.

Further, --do_ocr works perfectly and outputs high-accuracy, high-quality data.

To Reproduce Steps to reproduce the behavior:

  1. Run model for markdown generation: python3 inference.py --do_md --image_path/image.png -- ckpt ckpt.pt

Expected behavior Respectably accurate markdown generation

Intel Core i9 13900KF
Nvidia RTX 3090FE
32GB DDR5 5600MT/s (16x2)
Windows 11 - OS Build 22631.3737
CUDA 12.4

Flash-Attention-2 (v2.5.9.post1)
tiktoken 0.7.0
tqdm 4.66.4
omegaconf 2.0.6 (hydra-core 1.0.7)
boto3 1.34.140
iopath 0.1.10
fairscale 0.4.0
scipy 1.10.0
triton 2.3.1
https://github.com/facebookresearch/xformers.git@04de99bb28aa6de8d48fab3cdbbc9e3874c994b8
https://github.com/Dod-o/kosmos2.5_tools.git@fairseq
https://github.com/Dod-o/kosmos2.5_tools.git@infinibatch
https://github.com/Dod-o/kosmos2.5_tools.git@torchscale
https://github.com/Dod-o/kosmos2.5_tools.git@transformers
ntauth commented 2 months ago

Following