Open hewenbin opened 1 year ago
Hi, @hewenbin
Thank you for the quick turnaround. I'm wondering is there any code example without using any GUI. For example, I want a bash script to run on a bunch of images.
I would like to echo the same thing.
It would be very important to have a simple notebook tutorial with a few lines of code that allows us to evaluate the KOSMOS2 given a single image, text, and bounding boxes as input. Without that simple tutorial, it would be time-consuming to figure out how to adapt KOSMOS2 (instead of an interactive app) in other research projects. I sincerely hope this impactful work can be recognized and utilized by more and more people, such that it could significantly facilitate the research community of visual grounding. However, the learning curve of adapting this model to other research projects seems to be an obstacle.
Thank you so much for your great efforts in developing this amazing work!
Thank you for the quick turnaround. I'm wondering is there any code example without using any GUI.
@yolandalalala @hewenbin kosmos-2 is supported by huggingface team now
Hi,
Thank you so much for developing this impactful and impressive work! This work really bridges the gap in multimodal grounding capability to the visual world.
I would like to kindly ask if you can provide the simplest code snippets for phrase grounding tasks. Hopefully, this code snippet could enable us to experience the amazing phrase grounding capability of KOSMOS2 based on a single image and several noun phrases. I sincerely appreciate your time and help! Looking forward to hearing back from you.