Closed tensorboy closed 1 year ago
copy that, could you please tell me if you can run main.py successfully?
it will stuck when I run python main.py --image_src examples/3.jpg --out_image_name 3_out.jpg
warnings.warn(
/mnt/bd/data-tns-algo-us-audio-multimodal/environment/anaconda3/envs/python39/lib/python3.9/site-packages/transformers/generation/utils.py:1313: UserWarning: Using max_length
's default (20) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using max_new_tokens
to control the maximum length of the generation.
warnings.warn(
Step1, BLIP2 caption: a dog sitting on a porch with a bike
/mnt/bd/data-tns-algo-us-audio-multimodal/environment/anaconda3/envs/python39/lib/python3.9/site-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1659484809662/work/aten/src/ATen/native/TensorShape.cpp:2894.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] Step2, Dense Caption:
black and white dog sitting on the floor: [61, 106, 149, 259]; red bike that is parked: [59, 58, 272, 202]; white car parked on the street: [224, 35, 330, 81]; red bike parked on side of road: [26, 34, 53, 59];
very strang, it can run yesterday, but cannot run on today's code..
Hi tensor boy, I introduce semantic segment models, which is very slow in CPU (about 8 minutes). Try to use GPU following readme.md for GPU use, I will make this model optional for the next version.
i see, it is possibly because of that.. I will try to use GPU now.
The prompt I've gave to GPT4: Unexpected token '<', " <!DOCTYPE "... is not valid JSON when I run gradio apps
And the answer: It seems like you're experiencing an issue with Gradio, a Python library for creating machine learning demos with a UI. The error message you're encountering indicates that an HTML file or a response from an API is being treated as JSON, causing a parsing issue.
To help resolve this issue, I need more information about your specific use case. However, I can offer some general suggestions that may help:
Ensure that the data you're working with is valid JSON. Use a JSON validator, such as jsonlint.com, to check the format.
Check the response from any external API you're using. If the API returns an HTML error page instead of the expected JSON, this could cause the parsing error. Verify the API's endpoint, authentication, and request parameters.
If you're using an API, make sure you set the Accept header to application/json in your request. This tells the server that you expect a JSON response.
If you're loading a local JSON file, ensure that the file path is correct and the file is properly formatted as JSON.
If the issue persists, please provide more context or code snippets, and I'll be happy to help you further.
I can run it without gradio demo, here is the output of the code: python main.py --image_src examples/3.jpg --out_image_name 3_out.jpg
Generated Text: This image has a resolution of 369X276 and features a black and white dog sitting on the floor of a porch next to a parked red bike. The dense caption describes additional elements of the image, including a white car parked on the street and a second red bike parked on the side of the road. The image also includes several different regions with semantic labels, such as a porch, floor, wall, and trees. The dog and bike are both identified as objects within the image, with the bike being red in color. Other relevant information about the objects is provided as well, such as the fact that the bike is parked and the dog is simply sitting on the floor. Overall, this image captures a peaceful scene of a pup lounging on a porch with a bike nearby.
It will show "Unexpected token '<', " <h"... is not valid JSON"
when I change the code to this:
import gradio as gr
import cv2
import numpy as np
from PIL import Image
import base64
from io import BytesIO
from models.image_text_transformation import ImageTextTransformation
def pil_image_to_base64(image):
buffered = BytesIO()
image.save(buffered, format="JPEG")
img_str = base64.b64encode(buffered.getvalue()).decode()
return img_str
def add_logo():
with open("examples/logo.png", "rb") as f:
logo_base64 = base64.b64encode(f.read()).decode()
return logo_base64
def process_image(image_src, processor):
gen_text = processor.image_to_text(image_src)
gen_image = processor.text_to_image(gen_text)
gen_image_str = pil_image_to_base64(gen_image)
print('gen text', gen_text)
print('gen_image', gen_image)
print('gen image str', gen_image_str)
# Combine the outputs into a single HTML output
return gen_text
processor = ImageTextTransformation()
# Create Gradio input and output components
image_input = gr.inputs.Image(type='filepath', label="Input Image")
logo_base64 = add_logo()
# Create the title with the logo
title_with_logo = f'<img src="data:image/jpeg;base64,{logo_base64}" width="400" style="vertical-align: middle;"> Understanding Image with Text'
# Create Gradio interface
interface = gr.Interface(
fn=lambda image: process_image(image, processor), # Pass the processor object using a lambda function
inputs=image_input,
outputs=gr.Textbox(),
title=title_with_logo,
description="""
This code support image to text transformation. Then the generated text can do retrieval, question answering et al to conduct zero-shot.
"""
)
# Launch the interface
interface.launch(share=True)
what is the gradio version you are using @FingerRec ?
It's the issue with gradio, use ngrok fixed the problem.