gradio-app / gradio

Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
http://www.gradio.app
Apache License 2.0
30.56k stars 2.27k forks source link

Gradio's server memory resources cannot be automatically released #8549

Closed zhangyukun230 closed 23 hours ago

zhangyukun230 commented 1 week ago

Describe the bug

I started a service using Gradio, and as more web pages were opened, the system's memory usage increased. Even if I close these web pages, the memory resources are still very large. On a 32c machine, after more than a dozen requests, memory overflow causes the process to be forcibly killed

Have you searched existing issues? 🔎

Reproduction


demo = gr.Blocks(title = '数据分析',css='assets/appBot.css', theme=customTheme)

with demo:
    draw_seed = random.randint(0, 1000000000)
    state = gr.State({'session_seed': draw_seed})
    gr.Markdown(
        '''<h2><center>数据分析Agent</center></h2>
        <center>LLM + Code Interpreter + Agent tool, 提供文件上传,下载,加工,分析功能</center>
        ''' )
    with gr.Row(elem_classes='container'):
        with gr.Column(scale=4):
            with gr.Column():
                # Preview
                user_chatbot = mgr.Chatbot(
                    value=[[None, prologue]],
                    elem_id='user_chatbot',
                    elem_classes=['markdown-body'],
                    avatar_images=avatar_pairs,
                    height=800,
                    show_label=False,
                    show_copy_button=True,
                    llm_thinking_presets=[
                        qwen(
                            action_input_title='调用 <Action>',
                            action_output_title='完成调用')
                    ])
            with gr.Row():
                user_chatbot_input = mgr.MultimodalInput(
                    interactive=True,
                    placeholder='跟我聊聊吧~',
                    upload_button_props=dict(
                        file_count='multiple'))
            gr.Markdown(
                    '''<center>@zhangyukun33</center>'''  # noqa E501
                )
    def send_message(chatbot, input, _state):
        logger.warning(" send message!!")
        # 将发送的消息添加到聊天历史
        if 'user_agent' not in _state:
            logger.warning(" 创建agent!!")
            init_user(_state)
            logger.warning(" 创建agent成功!!")
        # 将发送的消息添加到聊天历史
        _uuid_str = check_uuid(uuid_str)
        user_agent = _state['user_agent']
        user_memory = _state['user_memory']
        request_word_dir =_state['request_word_dir']
        append_files = []
        for file in input.files:
            file_name = os.path.basename(file.path)
            # covert xxx.json to xxx_uuid_str.json
            file_name = file_name.replace('.', f'_{_uuid_str}.')
            file_path = os.path.join(request_word_dir, file_name)
            print('文件的绝对路径是file_path:' + file_path)
            if not os.path.exists(file_path):
                # make sure file path's directory exists
                os.makedirs(os.path.dirname(file_path), exist_ok=True)
                shutil.copy(file.path, file_path)
            append_files.append(file_path)
        chatbot.append([{'text': input.text, 'files': input.files}, None])
        yield {
            user_chatbot: chatbot,
            user_chatbot_input: None,
        }
        # get short term memory history
        history = user_memory.get_history()

        # get long term memory knowledge, currently get one file
        uploaded_file = None
        if len(append_files) > 0:
            uploaded_file = append_files[0]
        ref_doc = user_memory.run(
            query=input.text, url=uploaded_file, checked=True)

        response = ''
        try:
            for frame in user_agent.run(
                    input.text,
                    history=history,
                    ref_doc=ref_doc,
                    append_files=append_files,
                    work_dir = request_word_dir
            ):
                # important! do not change this
                response += frame
                chatbot[-1][1] = response
                yield {
                    user_chatbot: chatbot,
                }
            if len(history) == 0:
                user_memory.update_history(
                    Message(role='system', content=user_agent.system_prompt))

            file_names = ','.join(
                [os.path.basename(path) for path in append_files])

            user_memory.update_history([
                Message(role='user', content= f'上传文件: {str(file_names)}\n' + input.text if len(append_files) >0 else input.text),
                Message(role='assistant', content=response),
            ])
        except Exception as e:
            if 'dashscope.common.error.AuthenticationError' in str(e):
                msg = 'DASHSCOPE_API_KEY should be set via environment variable. You can acquire this in ' \
                      'https://help.aliyun.com/zh/dashscope/developer-reference/activate-dashscope-and-create-an-api-key'
            elif 'rate limit' in str(e):
                msg = 'Too many people are calling, please try again later.'
            else:
                msg = str(e)
            chatbot[-1][1] = msg
            yield {user_chatbot: chatbot}

    gr.on([user_chatbot_input.submit],
          fn=send_message,
          inputs=[user_chatbot, user_chatbot_input, state],
          outputs=[user_chatbot, user_chatbot_input])

    # 使用gradio的API来监听事件

    # demo.load(init_user, inputs=[state], outputs=[state])

demo.queue()
demo.launch(show_error=True, server_name="0.0.0.0", server_port=8081)

Screenshot

image As more and more people open web services, the memory will become larger and larger. Even if the webpage is closed, memory will not decrease

Logs

No response

System Info

gradio                    4.8.0
python =3.10

Severity

I can work around it

abidlabs commented 1 week ago

Hi @zhangyukun230 its not clear to me what resources are using that memory. However, in more recent versions of Gradio, we automatically release memory that is being held due to gr.State after 1 hour, and you can also do specific unloading of objects using the gr.Blocks.unload() event listener. Can you try upgrading to the latest version of Gradio?

I think it would be worth adding a section in docs about this @freddyaboulton as we get this a lot

freddyaboulton commented 3 days ago

Yes will work on some docs !