gradio-app / gradio

Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
http://www.gradio.app
Apache License 2.0
32.41k stars 2.42k forks source link

Gradio Endpoint Time Out Issue #8448

Closed SyedMuqtasidAli closed 3 months ago

SyedMuqtasidAli commented 3 months ago

Describe the bug

i created an gradio endpoint using model " Codeqwen1.5-7B " but when i am infrencing through gradio end point with my crew ai i am getting this error " image

is there any one please help me out this : i am using google colab (modle running on GPU ) i am using crew ai agents for infrencing -using hugging face model

Have you searched existing issues? 🔎

Reproduction

import gradio as gr

Screenshot

image

Logs

No response

System Info

4.32.2

Severity

I can work around it

abidlabs commented 3 months ago

Hi @SyedMuqtasidAli this issue is not very clear and does not provide minimal code example that we can use to reproduce the issue. See: https://stackoverflow.com/help/minimal-reproducible-example

SyedMuqtasidAli commented 3 months ago

@abidlabs here is complete code with complete scenario what I am doing and what I am facing: I created an gradio endpoint using codequen1.5-7B model here is code for this :

from transformers import AutoModelForCausalLM, AutoTokenizer from gradio import Interface import torch

device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained( "Qwen/CodeQwen1.5-7B-Chat-AWQ", torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained("Qwen/CodeQwen1.5-7B-Chat-AWQ")

def generate_code(prompt): """ Generates Python code using the CodeQwen model based on a prompt.

Args: prompt: A string containing the prompt describing the desired code.

Returns: A string containing the generated Python code. """ messages = [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate( model_inputs.input_ids, max_new_tokens=512 ) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] return response

Define the Gradio interface

interface = Interface( fn=generate_code, inputs="text", outputs="text", title="Generate Python Code with CodeQwen", description="Describe the Python code you need and let CodeQwen assist you with generating the code!" )

Launch the Gradio interface

interface.launch()


now then I want to integrate this gradio end-point with my crew ai agent through my custom class : here is code for this : import requests import logging import time

class GradioModelProxy: def init(self, url): self.url = url self.default_params = {} # Stores default parameters for the model logging.basicConfig(level=logging.DEBUG)

def __call__(self, input_text, **kwargs):
    # This makes the instance callable
    return self.generate(input_text, **kwargs)

def generate(self, input_text, **kwargs):
    # Ensure input_text is a string
    input_text = str(input_text)

    # Prepare the request payload with default parameters included
    data_to_send = {"data": [input_text], **self.default_params}

    # Log the data being sent
    logging.debug(f"Sending data to Gradio: {data_to_send}")

    retries = 3
    for attempt in range(retries):
        try:
            # Send the request to the Gradio endpoint with a 15-minute timeout
            logging.debug(f"Attempt {attempt + 1} of {retries}")
            start_time = time.time()  # Record start time
            response = requests.post(self.url, json=data_to_send, timeout=900)  # 900 seconds = 15 minutes
            end_time = time.time()  # Record end time

            # Calculate time taken for the request
            duration = end_time - start_time
            logging.debug(f"Request took {duration} seconds.")

            # Check the response status code
            if response.status_code == 200:
                logging.debug("Received successful response from Gradio")
                return response.json()['data'][0]
            else:
                logging.error(f"Failed to get valid response. Status Code: {response.status_code}")
                logging.debug("Retrying...")
                time.sleep(5)  # Wait for 5 seconds before retrying
        except requests.exceptions.RequestException as e:
            logging.error(f"Request failed: {e}")
            logging.debug("Retrying...")
            time.sleep(5)  # Wait for 5 seconds before retrying

    raise Exception(f"Gradio model query failed after {retries} attempts")

def bind(self, **kwargs):
    # Optionally implement this method if your system uses it to set up or configure models
    self.default_params.update(kwargs)
    logging.debug(f"Updated default parameters: {self.default_params}")
    return self

Example usage in your crewai setup:

from crewai import Agent, Task, Crew

Initialize the Gradio model proxy with the URL of your Gradio endpoint

gradio_url = "https://f74e223631febfd885.gradio.live/api/predict" llm = GradioModelProxy(gradio_url)

You might bind some initial settings if necessary (example usage)

llm.bind(max_length=100) # Set a default max_length for generation, if applicable

from crewai import Agent, Task, Crew, Process from langchain_community.llms import HuggingFaceHub import os os.environ["HUGGINGFACEHUB_API_TOKEN"] = "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"

Define your agents with roles, goals, and additional attributes

researcher = Agent( role='Senior Developer', goal='complete code generation', backstory=( "You are a Senior developer" "Your expertise lies in generating code."

), verbose=True, allow_delegation=False, llm=llm )

Create tasks for your agents

task1 = Task( description=( "write a code of BFS in python " ), expected_output='write code', agent=researcher, human_input=False, )

Instantiate your crew with a sequential process

crew = Crew( agents=[researcher], tasks=[task1], verbose=2 )

Get your crew to work!

result = crew.kickoff()

print("######################") print(result)

now when I am infrencing it I am getting time out error 504 this error image

abidlabs commented 3 months ago

Hi @SyedMuqtasidAli this code is not properly formatted nor is it minimal as it includes many other dependencies. Please provide a minimal example that our team can investigate, thanks

abidlabs commented 3 months ago

Closing for lack of a suitable repro