langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
45.32k stars 6.37k forks source link

The downstream node files parameter of DALL-E 3 tool cannot be obtained #7618

Closed StrangeXin closed 2 weeks ago

StrangeXin commented 2 weeks ago

Self Checks

Dify version

0.7.1

Cloud or Self Hosted

Cloud

Steps to reproduce

The files variable output by the DALL-E 3 tool node cannot be used in downstream nodes, such as obtaining image links from files and processing them in the code execution node

iShot_2024-08-25_20 18 00 iShot_2024-08-25_20 18 16

✔️ Expected Behavior

The code execution node can obtain the files variable of DALL-E 3 node

❌ Actual Behavior

No response

dosubot[bot] commented 2 weeks ago

@StrangeXin Hello there! I'm here to help you with any bugs, questions, or contributions you have for the DIFY tool. Let me know how I can assist you!

The code execution node cannot obtain the files variable from the DALL-E 3 tool node in version 0.7.1 of the Dify cloud platform because the DALL-E 3 tool node saves the generated image to the variable pool using a specific key (self.VARIABLE_KEY.IMAGE.value). If the code execution node does not use the same key to retrieve the variable, it will not be able to access the files.

Here is the relevant code snippet from the DALL-E 3 tool node:

from typing import Any, Dict, List, Union
from core.tools.entities.tool_entities import ToolInvokeMessage
from core.tools.tool.builtin_tool import BuiltinTool

from base64 import b64decode

from openai import OpenAI

class DallE3Tool(BuiltinTool):
    def _invoke(self, 
                user_id: str, 
               tool_paramters: Dict[str, Any], 
        ) -> Union[ToolInvokeMessage, List[ToolInvokeMessage]]:
        """
            invoke tools
        """
        client = OpenAI(
            api_key=self.runtime.credentials['openai_api_key'],
        )

        # prompt
        prompt = tool_paramters.get('prompt', '')
        if not prompt:
            return self.create_text_message('Please input prompt')

        # call openapi dalle3
        response = client.images.generate(
            prompt=prompt, model='dall-e-3',
            size='1024x1024', n=1, style='vivid', quality='standard',
            response_format='b64_json'
        )
result = []
        for image in response.data:
            # Save all images to the variable pool through the save_as parameter. The variable name is self.VARIABLE_KEY.IMAGE.value. If new images are generated later, they will overwrite the previous images.
            result.append(self.create_blob_message(blob=b64decode(image.b64_json), 
                                                   meta={ 'mime_type': 'image/png' },
                                                    save_as=self.VARIABLE_KEY.IMAGE.value))

        return result

To ensure the code execution node can access the files, it must use the same key (self.VARIABLE_KEY.IMAGE.value) to retrieve the variable from the variable pool [1][2][3].

To continue talking to Dosu, mention @dosu.

crazywoola commented 2 weeks ago

Duplicated https://github.com/langgenius/dify/issues/5346