microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
19.08k stars 2.43k forks source link

Inference on my own images #1575

Open unrue opened 2 weeks ago

unrue commented 2 weeks ago

I installed Kosmos-2 and I would to make inference like Huggingface demo on my own images, so extracting the description and bounding boxes.

I don't understand how to make reading the manual. Is it the Evaluation step? Could you describe me how to proceed? Thanks.

pengzhiliang commented 1 week ago

Hello,

We provide both the demo and evaluation code in our repo. If you want to extract the bboxes and corresponding text description, please refer to these files. This file can extract them.

Another demo/api is host by Nvidia

Hope those can help you!