microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
19.21k stars 2.45k forks source link

[Kosmos-2]: colab tutorial or notebook for single-image inference of Visual Grounding task or VQA task #1200

Open xyliugo opened 1 year ago

xyliugo commented 1 year ago

thanks for your impressive work! I wonder whether you provide code in the form of Colab or notebooks for us to test a single image sample (in-the-wild) instead of testing on existing datasets such as RefCOCO. Specifically, we are interested in obtaining bounding box-level outputs. I have not been able to find a similar question or issue. If there is an existing issue discussing this matter, could you please share the link?

BIGBALLON commented 1 year ago

Hi, @xyliugo these issues may help you: