microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
19.49k stars 2.48k forks source link

More functions in demo #1170

Open ErrorMelody opened 1 year ago

ErrorMelody commented 1 year ago

what an exciting job! However, the functions displayed in online demo or local-hosted demo are the same. Only images can be input, and the model provides boxes and caption.But, the paper mentions many functions, such as inputting the corresponding box to generate captions. When will these functions be released?

pengzhiliang commented 1 year ago

@ErrorMelody Thank you for the attention! We will release it in the future.

You also can unlock it with several changes:

  1. change the gr.Radio component into gr.Text component.
  2. inputs = f"[image]{user_image_path}{text_input}" in here.
  3. host it.
  4. enable the sampling, and then enjoy it!
mu-cai commented 1 year ago

Can you specifically talk about how to put the bounding box, i.e., grounding tokens into the text prompt in demo/test script? Can not wait to see the object description capability. Thank you!

wanghao-cst commented 1 year ago

@ErrorMelody Thank you for the attention! We will release it in the future.

You also can unlock it with several changes:

  1. change the gr.Radio component into gr.Text component.
  2. inputs = f"[image]{user_image_path}{text_input}" in here.
  3. host it.
  4. enable the sampling, and then enjoy it!

Hi, may I know what is the argument in step1 gr.Text()? Could you please share it to us?

BIGBALLON commented 1 year ago

@ErrorMelody Thank you for the attention! We will release it in the future.

You also can unlock it with several changes:

  1. change the gr.Radio component into gr.Text component.
  2. inputs = f"[image]{user_image_path}{text_input}" in here.
  3. host it.
  4. enable the sampling, and then enjoy it!

Hi, @pengzhiliang. can you describe how to change the code for "Grounded question answering"?

sheldonchiu commented 1 year ago

@ErrorMelody Thank you for the attention! We will release it in the future.

You also can unlock it with several changes:

1. change the [gr.Radio](https://github.com/microsoft/unilm/blob/874dfed8008ecf6bfc077e161b3fdced8c4fbf8c/kosmos-2/demo/gradio_app.py#L490) component into gr.Text component.

2. inputs = f"[image]{user_image_path}{text_input}" in [here](https://github.com/microsoft/unilm/blob/874dfed8008ecf6bfc077e161b3fdced8c4fbf8c/kosmos-2/demo/gradio_app.py#L348).

3. host it.

4. enable the sampling, and then enjoy it!

I have implemented this function in my own folk https://github.com/sheldonchiu/unilm

Can you specifically talk about how to put the bounding box, i.e., grounding tokens into the text prompt in demo/test script? Can not wait to see the object description capability. Thank you!

I have make a small tool to easily test this function (https://sheldonchiu.github.io/kosmos2-prompt-tool/)

Below are some quick demo:

  1. Create a bounding box using my tool demo2
  2. Embed the output in your prompt demo1

Example 2:

demo3

The accuracy of a response is profoundly influenced by the phrasing of the question. With a well-crafted prompt, the results can be astounding. Thanks for releasing this great model!

APPLE-XMT commented 1 year ago

@ErrorMelody Thank you for the attention! We will release it in the future. You also can unlock it with several changes:

1. change the [gr.Radio](https://github.com/microsoft/unilm/blob/874dfed8008ecf6bfc077e161b3fdced8c4fbf8c/kosmos-2/demo/gradio_app.py#L490) component into gr.Text component.

2. inputs = f"[image]{user_image_path}{text_input}" in [here](https://github.com/microsoft/unilm/blob/874dfed8008ecf6bfc077e161b3fdced8c4fbf8c/kosmos-2/demo/gradio_app.py#L348).

3. host it.

4. enable the sampling, and then enjoy it!

I have implemented this function in my own folk https://github.com/sheldonchiu/unilm

Can you specifically talk about how to put the bounding box, i.e., grounding tokens into the text prompt in demo/test script? Can not wait to see the object description capability. Thank you!

I have make a small tool to easily test this function (https://sheldonchiu.github.io/)

Below are some quick demo:

  1. Create a bounding box using my tool demo2
  2. Embed the output in your prompt demo1

Example 2:

demo3

The accuracy of a response is profoundly influenced by the phrasing of the question. With a well-crafted prompt, the results can be astounding. Thanks for releasing this great model!

Hello, may I ask if your demo can extract the aspect in the sentence?

donglixp commented 1 year ago

@ErrorMelody Thank you for the attention! We will release it in the future. You also can unlock it with several changes:

1. change the [gr.Radio](https://github.com/microsoft/unilm/blob/874dfed8008ecf6bfc077e161b3fdced8c4fbf8c/kosmos-2/demo/gradio_app.py#L490) component into gr.Text component.

2. inputs = f"[image]{user_image_path}{text_input}" in [here](https://github.com/microsoft/unilm/blob/874dfed8008ecf6bfc077e161b3fdced8c4fbf8c/kosmos-2/demo/gradio_app.py#L348).

3. host it.

4. enable the sampling, and then enjoy it!

I have implemented this function in my own folk https://github.com/sheldonchiu/unilm

Can you specifically talk about how to put the bounding box, i.e., grounding tokens into the text prompt in demo/test script? Can not wait to see the object description capability. Thank you!

I have make a small tool to easily test this function (https://sheldonchiu.github.io/)

Below are some quick demo:

  1. Create a bounding box using my tool demo2
  2. Embed the output in your prompt demo1

Example 2:

demo3

The accuracy of a response is profoundly influenced by the phrasing of the question. With a well-crafted prompt, the results can be astounding. Thanks for releasing this great model!

@sheldonchiu The prompt tool is super useful!