microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
20.29k stars 2.56k forks source link

Inference on my own images #1575

Open unrue opened 5 months ago

unrue commented 5 months ago

I installed Kosmos-2 and I would to make inference like Huggingface demo on my own images, so extracting the description and bounding boxes.

I don't understand how to make reading the manual. Is it the Evaluation step? Could you describe me how to proceed? Thanks.

pengzhiliang commented 5 months ago

Hello,

We provide both the demo and evaluation code in our repo. If you want to extract the bboxes and corresponding text description, please refer to these files. This file can extract them.

Another demo/api is host by Nvidia

Hope those can help you!