Server is up an running. Using gradio python run_gradio_demo.py --config config.gradio.yaml:
Running on local URL: http://127.0.0.1:7860
To create a public link, set `share=True` in `launch()`.
Running built-in example Given a collection of image A: /examples/a.jpg, B: /examples/b.jpg, C: /examples/c.jpg, please tell me how many zebras in these picture?. Gradio terminal output:
2023-04-06 15:41:17,757 - awesome_chat - INFO - ********************************************************************************
2023-04-06 15:41:17,758 - awesome_chat - INFO - input: Given a collection of image A: /examples/a.jpg, B: /examples/b.jpg, C: /examples/c.jpg, please tell me how many zebras in these picture?
2023-04-06 15:41:30,058 - awesome_chat - INFO - [{"task": "image-to-text", "id": 0, "dep": [-1], "args": {"image": "/examples/a.jpg" }}, {"task": "object-detection", "id": 1, "dep": [-1], "args": {"image": "/examples/a.jpg" }}, {"task": "visual-question-answering", "id": 2, "dep": [1], "args": {"image": "<GENERATED>-1", "text": "How many zebras in the picture?" }}, {"task": "image-to-text", "id": 3, "dep": [-1], "args": {"image": "/examples/b.jpg" }}, {"task": "object-detection", "id": 4, "dep": [-1], "args": {"image": "/examples/b.jpg" }}, {"task": "visual-question-answering", "id": 5, "dep": [4], "args": {"image": "<GENERATED>-4", "text": "How many zebras in the picture?" }}, {"task": "image-to-text", "id": 6, "dep": [-1], "args": {"image": "/examples/c.jpg" }}, {"task": "object-detection", "id": 7, "dep": [-1], "args": {"image": "/examples/c.jpg" }}, {"task": "visual-question-answering", "id": 8, "dep": [7], "args": {"image": "<GENERATED>-7", "text": "How many zebras in the picture?" }}]
2023-04-06 15:41:50,233 - awesome_chat - INFO - response: Based on the inference results, there are two zebras in the picture.
My workflow for your request is as follows:
1. I used the image-to-text model nlpconnect/vit-gpt2-image-captioning to generate a text description for each image.
2. Then I used the object-detection model facebook/detr-resnet-50 to detect the objects in the image and generate an image with predicted boxes.
3. Finally, I used the visual-question-answering model dandelin/vilt-b32-finetuned-vqa to answer your question.
For the image A: /examples/a.jpg, the object-detection model detected a cat and a potted plant in the image. The visual-question-answering model predicted that there are 0 zebras in the picture.
For the image B: /examples/b.jpg, the object-detection model detected a zebra in the image. The visual-question-answering model predicted that there is 1 zebra in the picture.
For the image C: /examples/c.jpg, the object-detection model detected three zebras in the image. The visual-question-answering model predicted that there are 2 zebras in the picture.
Therefore, there are two zebras in the picture.
Traceback (most recent call last):
File "/home/user/anaconda3/envs/jarvis/lib/python3.8/site-packages/gradio/routes.py", line 393, in run_predict
output = await app.get_blocks().process_api(
File "/home/user/anaconda3/envs/jarvis/lib/python3.8/site-packages/gradio/blocks.py", line 1108, in process_api
result = await self.call_function(
File "/home/user/anaconda3/envs/jarvis/lib/python3.8/site-packages/gradio/blocks.py", line 915, in call_function
prediction = await anyio.to_thread.run_sync(
File "/home/user/anaconda3/envs/jarvis/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/user/anaconda3/envs/jarvis/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/home/user/anaconda3/envs/jarvis/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "run_gradio_demo.py", line 82, in bot
image_urls, audio_urls, video_urls = extract_medias(message)
File "run_gradio_demo.py", line 18, in extract_medias
for match in image_pattern.finditer(message):
TypeError: expected string or bytes-like object
Same error is thrown in the following built-in examples:
Please generate a canny image based on /examples/f.jpg
what is in the examples/a.jpg
generate a video and audio about a dog is running on the grass
based on the /examples/a.jpg, please generate a video and audio
based on pose of /examples/d.jpg and content of /examples/e.jpg, please show me a new image
Server is up an running. Using gradio
python run_gradio_demo.py --config config.gradio.yaml
:Running built-in example
Given a collection of image A: /examples/a.jpg, B: /examples/b.jpg, C: /examples/c.jpg, please tell me how many zebras in these picture?
. Gradio terminal output:Same error is thrown in the following built-in examples:
Please generate a canny image based on /examples/f.jpg
what is in the examples/a.jpg
generate a video and audio about a dog is running on the grass
based on the /examples/a.jpg, please generate a video and audio
based on pose of /examples/d.jpg and content of /examples/e.jpg, please show me a new image