Is it possible to share the Kosmos-2.5 script that converts inference outputs to output image

ydshieh commented 1 month ago

Hi! First, thank you for open sourcing the Kosmos-2.5 model.

I have run it and it run smoothly.

The inference outputs look like

{'model': 'kosmos 2.5', 'task': 'ocr', 'width': 772, 'height': 1000, 'results': [{'text': '9', 'bounding box': {'x0': 702, 'y0': 35, 'x1': 708, 'y1': 48}}, {'text': '0', 'bounding box': {'x0': 103, 'y0': 250, 'x1': 107, 'y1': 262}}, {'text': 'π/2', 'bounding box': {'x0': 162, 'y0': 250, 'x1': 176, 'y1': 262}},

and I couldn't find the script to convert this to the output image like shared in the repository.

Is it possible for the team to share that conversion script? Thank you in advance!

Dod-o commented 1 month ago

hi @ydshieh , I have uploaded the script we used for drawing bounding boxes, please find it here : )

ydshieh commented 1 month ago

Thank you a lot ^_^ !

atlury commented 1 month ago

on what hardware are you running it @ydshieh ? Can you share some details. I am planning to buy old P40s but seems Kosmos-2.5 depends on Flash Attention-2 which only works on RTX 30x0 and RTX 40x0 and some high end cards like A100, H100.

ydshieh commented 1 month ago

I tried it once on a A10 machine.

microsoft / unilm

Is it possible to share the Kosmos-2.5 script that converts inference outputs to output image #1547