microsoft / JARVIS

JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf
MIT License
23.71k stars 1.97k forks source link

Gradio. Built-in example - Zebras. TypeError: expected string or bytes-like object #59

Closed ekiwi111 closed 1 year ago

ekiwi111 commented 1 year ago

Server is up an running. Using gradio python run_gradio_demo.py --config config.gradio.yaml:

Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.

Running built-in example Given a collection of image A: /examples/a.jpg, B: /examples/b.jpg, C: /examples/c.jpg, please tell me how many zebras in these picture?. Gradio terminal output:

2023-04-06 15:41:17,757 - awesome_chat - INFO - ********************************************************************************
2023-04-06 15:41:17,758 - awesome_chat - INFO - input: Given a collection of image A: /examples/a.jpg, B: /examples/b.jpg, C: /examples/c.jpg, please tell me how many zebras in these picture?
2023-04-06 15:41:30,058 - awesome_chat - INFO - [{"task": "image-to-text", "id": 0, "dep": [-1], "args": {"image": "/examples/a.jpg" }}, {"task": "object-detection", "id": 1, "dep": [-1], "args": {"image": "/examples/a.jpg" }}, {"task": "visual-question-answering", "id": 2, "dep": [1], "args": {"image": "<GENERATED>-1", "text": "How many zebras in the picture?" }}, {"task": "image-to-text", "id": 3, "dep": [-1], "args": {"image": "/examples/b.jpg" }}, {"task": "object-detection", "id": 4, "dep": [-1], "args": {"image": "/examples/b.jpg" }}, {"task": "visual-question-answering", "id": 5, "dep": [4], "args": {"image": "<GENERATED>-4", "text": "How many zebras in the picture?" }}, {"task": "image-to-text", "id": 6, "dep": [-1], "args": {"image": "/examples/c.jpg" }}, {"task": "object-detection", "id": 7, "dep": [-1], "args": {"image": "/examples/c.jpg" }}, {"task": "visual-question-answering", "id": 8, "dep": [7], "args": {"image": "<GENERATED>-7", "text": "How many zebras in the picture?" }}]
2023-04-06 15:41:50,233 - awesome_chat - INFO - response: Based on the inference results, there are two zebras in the picture. 

My workflow for your request is as follows: 

1. I used the image-to-text model nlpconnect/vit-gpt2-image-captioning to generate a text description for each image. 

2. Then I used the object-detection model facebook/detr-resnet-50 to detect the objects in the image and generate an image with predicted boxes. 

3. Finally, I used the visual-question-answering model dandelin/vilt-b32-finetuned-vqa to answer your question. 

For the image A: /examples/a.jpg, the object-detection model detected a cat and a potted plant in the image. The visual-question-answering model predicted that there are 0 zebras in the picture. 

For the image B: /examples/b.jpg, the object-detection model detected a zebra in the image. The visual-question-answering model predicted that there is 1 zebra in the picture. 

For the image C: /examples/c.jpg, the object-detection model detected three zebras in the image. The visual-question-answering model predicted that there are 2 zebras in the picture. 

Therefore, there are two zebras in the picture.
Traceback (most recent call last):
  File "/home/user/anaconda3/envs/jarvis/lib/python3.8/site-packages/gradio/routes.py", line 393, in run_predict
    output = await app.get_blocks().process_api(
  File "/home/user/anaconda3/envs/jarvis/lib/python3.8/site-packages/gradio/blocks.py", line 1108, in process_api
    result = await self.call_function(
  File "/home/user/anaconda3/envs/jarvis/lib/python3.8/site-packages/gradio/blocks.py", line 915, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/home/user/anaconda3/envs/jarvis/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/home/user/anaconda3/envs/jarvis/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/home/user/anaconda3/envs/jarvis/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "run_gradio_demo.py", line 82, in bot
    image_urls, audio_urls, video_urls = extract_medias(message)
  File "run_gradio_demo.py", line 18, in extract_medias
    for match in image_pattern.finditer(message):
TypeError: expected string or bytes-like object

Same error is thrown in the following built-in examples:

tricktreat commented 1 year ago

The issue has been fixed in the latest commit.