langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
47.19k stars 6.71k forks source link

Workflow as Tool, if there are pictures in the parameters, they cannot be quoted correctly. #5113

Closed svcvit closed 3 months ago

svcvit commented 3 months ago

Self Checks

Dify version

0.6.10

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

I converted a workflow into a tool. If the workflow enables image upload, an __image parameter will be automatically generated. When this parameter is referenced in the chat workflow, you can select sys.files, but the tool cannot reference this image correctly.

Screenshot reference is as follows

image image

I carefully checked the input content and found that only the parameters here are different.

image

✔️ Expected Behavior

No response

❌ Actual Behavior

No response

dosubot[bot] commented 3 months ago

Hello @svcvit! I'm a bot that can help you solve bugs, answer questions, and become a contributor. I'm here to help you while waiting for a human maintainer.

To correctly reference the __image parameter in the "Workflow as Tool" feature in Dify version 0.6.10, ensure that the parameter is handled as a file type. The _generate_parameters method in the ToolNode class is responsible for generating the parameters, and it includes logic to handle file parameters.

Here is the relevant part of the code that deals with file parameters:

def _generate_parameters(self, variable_pool: VariablePool, node_data: ToolNodeData, tool_runtime: Tool) -> dict:
    """
    Generate parameters
    """
    tool_parameters = tool_runtime.get_all_runtime_parameters()

    def fetch_parameter(name: str) -> Optional[ToolParameter]:
        return next((parameter for parameter in tool_parameters if parameter.name == name), None)

    result = {}
    for parameter_name in node_data.tool_parameters:
        parameter = fetch_parameter(parameter_name)
        if not parameter:
            continue
        if parameter.type == ToolParameter.ToolParameterType.FILE:
            result[parameter_name] = [
                v.to_dict() for v in self._fetch_files(variable_pool)
            ]
        else:
            input = node_data.tool_parameters[parameter_name]
            if input.type == 'mixed':
                result[parameter_name] = self._format_variable_template(input.value, variable_pool)
            elif input.type == 'variable':
                result[parameter_name] = variable_pool.get_variable_value(input.value)
            elif input.type == 'constant':
                result[parameter_name] = input.value
    return result

To reference the __image parameter correctly, ensure that it is included in the node_data.tool_parameters and that its type is set to ToolParameter.ToolParameterType.FILE. The _fetch_files method will then retrieve the files from the variable_pool.

Here is an example of how you might set up the node_data to include the __image parameter:

node_data.tool_parameters = {
    '__image': ToolParameter(name='__image', type=ToolParameter.ToolParameterType.FILE)
}

Additionally, the WorkflowTool class in api/core/tools/tool/workflow_tool.py handles the invocation of tools and processes the tool responses, including binary data such as images. The _transform_args method transforms the tool parameters and extracts files, ensuring that image files are correctly handled. The _extract_files method extracts files from the result, ensuring that the __image parameter is correctly referenced and processed within the workflow.

Here is the relevant part of the WorkflowTool class:

class WorkflowTool(Tool):
    # ... other methods ...

    def _transform_args(self, tool_parameters: dict) -> tuple[dict, list[dict]]:
        """
        Transform the tool parameters
        """
        parameter_rules = self.get_all_runtime_parameters()
        parameters_result = {}
        files = []
        for parameter in parameter_rules:
            if parameter.type == ToolParameter.ToolParameterType.FILE:
                file = tool_parameters.get(parameter.name)
                if file:
                    try:
                        file_var_list = [FileVar(**f) for f in file]
                        for file_var in file_var_list:
                            file_dict = {
                                'transfer_method': file_var.transfer_method.value,
                                'type': file_var.type.value,
                            }
                            if file_var.transfer_method == FileTransferMethod.TOOL_FILE:
                                file_dict['tool_file_id'] = file_var.related_id
                            elif file_var.transfer_method == FileTransferMethod.LOCAL_FILE:
                                file_dict['upload_file_id'] = file_var.related_id
                            elif file_var.transfer_method == FileTransferMethod.REMOTE_URL:
                                file_dict['url'] = file_var.preview_url

                            files.append(file_dict)
                    except Exception as e:
                        logger.exception(e)
            else:
                parameters_result[parameter.name] = tool_parameters.get(parameter.name)

        return parameters_result, files

    def _extract_files(self, outputs: dict) -> tuple[dict, list[FileVar]]:
        """
        Extract files from the result
        """
        files = []
        result = {}
        for key, value in outputs.items():
            if isinstance(value, list):
                has_file = False
                for item in value:
                    if isinstance(item, dict) and item.get('__variant') == 'FileVar':
                        try:
                            files.append(FileVar(**item))
                            has_file = True
                        except Exception as e:
                            pass
                if has_file:
                    continue

            result[key] = value

        return result, files

This integration ensures that the __image parameter is correctly referenced and processed within the workflow [1][2][3].

To continue talking to Dosu, mention @dosu.

laipz8200 commented 3 months ago

Did you get an error? If so, could you show us the error message?

svcvit commented 3 months ago

Did you get an error? If so, could you show us the error message?

No error message was reported. I publish an image recognition workflow as a tool, for example, the name is: Image Recognition

I added the function of uploading pictures to the main chat process, and then quoted this tool. The tool uses the visual model of qwen-VL, and LLM prompt words to obtain the picture content. The output content of the tool does not recognize the picture at all. , as if this picture was not uploaded

没有报错信息哦。我将一个图片识别的的工作流发布为工具,例如名字叫:图片识别

我在聊天主流程里添加了上传图片功能,然后引用这个工具,工具使用qwen-VL这种视觉模型,LLM提示词获取图片内容,工具里面的输出的内容完全没有识别图片,就像没有传这张图片一样。

svcvit commented 3 months ago

https://github.com/langgenius/dify/assets/3839507/d44aabfb-2f1e-4ff4-8340-1b08034ce766

I recorded a video demonstrating this issue, it might be easier to understand

我录制了一个视频,演示这个问题,可能更容易理解。