langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
53.44k stars 7.81k forks source link

Stable Diffusion WebUI in workflow outputs NULL #5028

Closed czslxk closed 5 months ago

czslxk commented 5 months ago

Self Checks

Dify version

0.6.10

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

I added SD in a basic workflow, and give a input "a dog". But it cannot give me any result, which means the "text" in the output of SD is empty. Here is the screenshot. image

✔️ Expected Behavior

Output a picture or a base64 string of the picture.

❌ Actual Behavior

Output NULL.

dosubot[bot] commented 5 months ago

Hello @czslxk! I'm here to help you with bugs, questions, and becoming a contributor. Let's tackle this issue together!

To resolve the issue where the Stable Diffusion WebUI node in your workflow outputs NULL instead of a picture or a base64 string of the picture, ensure that you have correctly configured the required credentials for the Stable Diffusion provider. Specifically, you need to provide the base_url and model parameters correctly.

Here is the relevant configuration snippet:

credentials_for_provider:
  base_url:
    type: secret-input
    required: true
    label:
      en_US: Base URL
    placeholder:
      en_US: Please input your StableDiffusion server's Base URL
  model:
    type: text-input
    required: true
    label:
      en_US: Model
    placeholder:
      en_US: Please input your model
    help:
      en_US: The model name of the StableDiffusion server

Make sure you have:

  1. Entered the correct Base URL of your Stable Diffusion server.
  2. Specified the correct model name used by your Stable Diffusion server.

Additionally, follow these steps to ensure your Stable Diffusion WebUI is properly set up:

  1. Make sure you have a machine with a GPU: While not necessary, using a GPU will significantly speed up image generation.
  2. Launch Stable Diffusion WebUI:
    • Clone the Stable Diffusion WebUI repository from the official repository.
    • Launch it locally using the appropriate command for your operating system:
      • Windows:
        cd stable-diffusion-webui
        ./webui.bat --api --listen
      • Linux:
        cd stable-diffusion-webui
        ./webui.sh --api --listen
  3. Prepare Models: Download the models from HuggingFace or other sources and place them in the models directory of the Stable Diffusion WebUI.
  4. Get Model Name: Access the model list at http://your_id:port/sdapi/v1/sd-models to retrieve the model name.

Ensure that the base_url and model parameters in your configuration match the setup of your Stable Diffusion WebUI. If these credentials are not set correctly, the node may not be able to communicate with the server properly, resulting in a NULL output.

Additionally, ensure that the send_images parameter is set to True in your tool parameters. This parameter is crucial for the output to include the generated images:

DRAW_TEXT_OPTIONS = {
    ...
    "send_images": True,
    ...
}

If the send_images parameter is not set correctly, the output might not include the generated images, leading to a NULL result.

Moreover, ensure that other parameters such as prompt, steps, width, height, and negative_prompt are correctly set as per your requirements. Here is a snippet of the parameters configuration:

parameters:
  - name: prompt
    type: string
    required: true
    label:
      en_US: Prompt
    human_description:
      en_US: Image prompt, you can check the official documentation of Stable Diffusion
    llm_description: Image prompt of Stable Diffusion, you should describe the image you want to generate as a list of words as possible as detailed, the prompt must be written in English.
    form: llm
  - name: model
    type: string
    required: false
    label:
      en_US: Model Name
    human_description:
      en_US: Model Name
    form: form
  - name: lora
    type: string
    required: false
    label:
      en_US: Lora
    human_description:
      en_US: Lora
    form: form
    default: ""
  - name: steps
    type: number
    required: false
    label:
      en_US: Steps
    human_description:
      en_US: Steps
    form: form
    default: 10
  - name: width
    type: number
    required: false
    label:
      en_US: Width
    human_description:
      en_US: Width
    form: form
    default: 1024
  - name: height
    type: number
    required: false
    label:
      en_US: Height
    human_description:
      en_US: Height
    form: form
    default: 1024
  - name: negative_prompt
    type: string
    required: false
    label:
      en_US: Negative prompt
    human_description:
      en_US: Negative prompt
    form: form
    default: bad art, ugly, deformed, watermark, duplicated, discontinuous lines

By ensuring all these configurations and parameters are correctly set, you should be able to resolve the issue of the Stable Diffusion WebUI node outputting NULL [1][2][3][4].

To continue talking to Dosu, mention @dosu.

czslxk commented 5 months ago

Hello @czslxk! I'm here to help you with bugs, questions, and becoming a contributor. Let's tackle this issue together!

To resolve the issue where the Stable Diffusion WebUI node in your workflow outputs NULL instead of a picture or a base64 string of the picture, ensure that you have correctly configured the required credentials for the Stable Diffusion provider. Specifically, you need to provide the base_url and model parameters correctly.

Here is the relevant configuration snippet:

credentials_for_provider:
  base_url:
    type: secret-input
    required: true
    label:
      en_US: Base URL
    placeholder:
      en_US: Please input your StableDiffusion server's Base URL
  model:
    type: text-input
    required: true
    label:
      en_US: Model
    placeholder:
      en_US: Please input your model
    help:
      en_US: The model name of the StableDiffusion server

Make sure you have:

  1. Entered the correct Base URL of your Stable Diffusion server.
  2. Specified the correct model name used by your Stable Diffusion server.

Additionally, follow these steps to ensure your Stable Diffusion WebUI is properly set up:

  1. Make sure you have a machine with a GPU: While not necessary, using a GPU will significantly speed up image generation.
  2. Launch Stable Diffusion WebUI:

    • Clone the Stable Diffusion WebUI repository from the official repository.
    • Launch it locally using the appropriate command for your operating system:

      • Windows:
        cd stable-diffusion-webui
        ./webui.bat --api --listen
      • Linux:
        cd stable-diffusion-webui
        ./webui.sh --api --listen
  3. Prepare Models: Download the models from HuggingFace or other sources and place them in the models directory of the Stable Diffusion WebUI.
  4. Get Model Name: Access the model list at http://your_id:port/sdapi/v1/sd-models to retrieve the model name.

Ensure that the base_url and model parameters in your configuration match the setup of your Stable Diffusion WebUI. If these credentials are not set correctly, the node may not be able to communicate with the server properly, resulting in a NULL output.

Additionally, ensure that the send_images parameter is set to True in your tool parameters. This parameter is crucial for the output to include the generated images:

DRAW_TEXT_OPTIONS = {
    ...
    "send_images": True,
    ...
}

If the send_images parameter is not set correctly, the output might not include the generated images, leading to a NULL result.

Moreover, ensure that other parameters such as prompt, steps, width, height, and negative_prompt are correctly set as per your requirements. Here is a snippet of the parameters configuration:

parameters:
  - name: prompt
    type: string
    required: true
    label:
      en_US: Prompt
    human_description:
      en_US: Image prompt, you can check the official documentation of Stable Diffusion
    llm_description: Image prompt of Stable Diffusion, you should describe the image you want to generate as a list of words as possible as detailed, the prompt must be written in English.
    form: llm
  - name: model
    type: string
    required: false
    label:
      en_US: Model Name
    human_description:
      en_US: Model Name
    form: form
  - name: lora
    type: string
    required: false
    label:
      en_US: Lora
    human_description:
      en_US: Lora
    form: form
    default: ""
  - name: steps
    type: number
    required: false
    label:
      en_US: Steps
    human_description:
      en_US: Steps
    form: form
    default: 10
  - name: width
    type: number
    required: false
    label:
      en_US: Width
    human_description:
      en_US: Width
    form: form
    default: 1024
  - name: height
    type: number
    required: false
    label:
      en_US: Height
    human_description:
      en_US: Height
    form: form
    default: 1024
  - name: negative_prompt
    type: string
    required: false
    label:
      en_US: Negative prompt
    human_description:
      en_US: Negative prompt
    form: form
    default: bad art, ugly, deformed, watermark, duplicated, discontinuous lines

By ensuring all these configurations and parameters are correctly set, you should be able to resolve the issue of the Stable Diffusion WebUI node outputting NULL [1][2][3][4].

To continue talking to Dosu, mention @dosu.

I think I set all you mentioned correctly because I can use SD as a tool in the Agent function, but only once (It may because the setting of the Agent). So I think the base url, model name and send_images parameter are all right. Here is one screenshot when I use Agent. image

dosubot[bot] commented 5 months ago

If you have confirmed that the base_url, model, and send_images parameters are correctly set, and you can use Stable Diffusion as a tool in the Agent function, the issue might be related to the specific configuration or parameters in your workflow.

Here are a few additional checks and steps you can take:

  1. Prompt Parameter: Ensure that the prompt parameter is correctly set and not empty. The prompt is mandatory and should be a detailed description of the image you want to generate.

  2. Check for Errors: Look for any error messages or logs that might provide more insight into why the output is NULL. This can help identify if there are any issues with the request or response.

  3. Validate Credentials: Even though the credentials work in the Agent function, double-check that they are correctly configured in the workflow context. The StableDiffusionProvider class includes a method _validate_credentials that validates the provided credentials. Ensure that your credentials pass this validation step.

  4. Workflow Configuration: Verify that all other parameters such as steps, width, height, and negative_prompt are correctly set as per your requirements. Incorrect or missing parameters might cause the node to fail silently.

  5. Network Issues: Ensure there are no network issues or firewalls blocking the communication between your workflow and the Stable Diffusion server.

If all these checks are in place and the issue persists, it might be helpful to review the specific configuration and parameters used in your workflow compared to the Agent function where it works. This can help identify any discrepancies or additional settings that might be required in the workflow context.

To continue talking to Dosu, mention @dosu.

czslxk commented 5 months ago

After finish the checks and steps you mentioned, I found the key point is the "Agent Mode" in "Agent Setting". When I use llama3, the key of "Agent Mode" is "ReAct". But when I use other model, such as Moonshot AI and GPT 3.5, the key is "Function Calling".

"ReAct" may leads to dead loop, which means the agent will communicate with itself and won't stop. But the "Function Calling" is more like what I want, just call function when needs.

So I have some questions as below:

  1. What's the difference between these two modes? And what kind of task are they design for?

  2. Could we change the mode for each model? For example, I want to change the "Agent Mode" from "ReAct" to "Function Calling", how could I do that?

dosubot[bot] commented 5 months ago

The difference between the "ReAct" and "Function Calling" modes lies in their approach to selecting and querying knowledge bases:

Yes, it is possible to change the mode for each model. You can set the type of inference mode for an agent, choosing between ReAct and Function Calling modes. This flexibility allows you to tailor the agent's behavior to better suit your application's needs. The mode can be set in the model configuration, and the system will adapt its behavior accordingly [1][2][3].

To continue talking to Dosu, mention @dosu.

czslxk commented 5 months ago

The difference between the "ReAct" and "Function Calling" modes lies in their approach to selecting and querying knowledge bases:

  • ReAct Mode: This mode involves reasoning and acting based on the user query. The system model uses its reasoning capabilities to determine the most relevant knowledge base to query. It is designed for tasks that require more complex reasoning and decision-making processes.
  • Function Calling Mode: In this mode, each knowledge base acts as a tool function. The system model autonomously selects the most relevant knowledge base to query based on the semantic match between the user question and the knowledge base description. This mode is more straightforward and is designed for tasks that involve direct function calls without extensive reasoning.

Yes, it is possible to change the mode for each model. You can set the type of inference mode for an agent, choosing between ReAct and Function Calling modes. This flexibility allows you to tailor the agent's behavior to better suit your application's needs. The mode can be set in the model configuration, and the system will adapt its behavior accordingly [1][2][3].

To continue talking to Dosu, mention @dosu.

But I cannot change the mode in Web page.

dosubot[bot] commented 5 months ago

To change the "Agent Mode" from "ReAct" to "Function Calling" in the web interface, you need to toggle the isFunctionCall property. This can be done by updating the tempPayload state within the AgentSetting component.

Here is the relevant part of the code where the agent mode is displayed:

<ItemPanel
  className='mb-4'
  icon={
    <CuteRobote className='w-4 h-4 text-indigo-600' />
  }
  name={t('appDebug.agent.agentMode')}
  description={t('appDebug.agent.agentModeDes')}
>
  <div className='leading-[18px] text-[13px] font-medium text-gray-900'>
    {isFunctionCall ? t('appDebug.agent.agentModeType.functionCall') : t('appDebug.agent.agentModeType.ReACT')}
  </div>
</ItemPanel>

To change the mode, you would need to update the isFunctionCall state. This can be done by adding a toggle mechanism, such as a button or a switch, that updates the tempPayload state accordingly.

Here is an example of how you might add a button to toggle the agent mode:

<ItemPanel
  className='mb-4'
  icon={
    <CuteRobote className='w-4 h-4 text-indigo-600' />
  }
  name={t('appDebug.agent.agentMode')}
  description={t('appDebug.agent.agentModeDes')}
>
  <div className='leading-[18px] text-[13px] font-medium text-gray-900'>
    {isFunctionCall ? t('appDebug.agent.agentModeType.functionCall') : t('appDebug.agent.agentModeType.ReACT')}
  </div>
  <Button
    onClick={() => setTempPayload({ ...tempPayload, isFunctionCall: !isFunctionCall })}
  >
    {isFunctionCall ? t('Switch to ReAct') : t('Switch to Function Calling')}
  </Button>
</ItemPanel>

This button will toggle the isFunctionCall state between true and false, effectively changing the agent mode.

The descriptions for the agent modes are as follows:

These descriptions can be found in the web/i18n/zh-Hans/app-debug.ts file under the agentModeType object [1][2][3][4].

To continue talking to Dosu, mention @dosu.

czslxk commented 5 months ago

Why does llama3 default to ReAct instead of Function Call in Agent?

dosubot[bot] commented 5 months ago

Llama3 defaults to "ReAct" mode instead of "Function Calling" in the Agent settings because the isFunctionCall property is set to false by default. This is indicated by the following line in the code:

<div className='leading-[18px] text-[13px] font-medium text-gray-900'>{isFunctionCall ? t('appDebug.agent.agentModeType.functionCall') : t('appDebug.agent.agentModeType.ReACT')}</div>

To change the default mode for llama3 to "Function Calling", you need to set the isFunctionCall property to true when initializing the AgentSetting component. Here is an example:

<AgentSetting
  isChatModel={true} // or false, depending on your use case
  payload={yourPayload}
  isFunctionCall={true} // Set this to true to default to "Function Calling"
  onCancel={yourOnCancelFunction}
  onSave={yourOnSaveFunction}
/>

By setting isFunctionCall to true, the default mode will be "Function Calling" [1][2][3].

To continue talking to Dosu, mention @dosu.

crazywoola commented 5 months ago

Why does llama3 default to ReAct instead of Function Call in Agent?

It could be, but we might need the community's help to do it, because we are in full capacity now.

marco-qu commented 2 months ago

To change the "Agent Mode" from "ReAct" to "Function Calling" in the web interface, you need to toggle the isFunctionCall property. This can be done by updating the tempPayload state within the AgentSetting component.

Here is the relevant part of the code where the agent mode is displayed:

<ItemPanel
  className='mb-4'
  icon={
    <CuteRobote className='w-4 h-4 text-indigo-600' />
  }
  name={t('appDebug.agent.agentMode')}
  description={t('appDebug.agent.agentModeDes')}
>
  <div className='leading-[18px] text-[13px] font-medium text-gray-900'>
    {isFunctionCall ? t('appDebug.agent.agentModeType.functionCall') : t('appDebug.agent.agentModeType.ReACT')}
  </div>
</ItemPanel>

To change the mode, you would need to update the isFunctionCall state. This can be done by adding a toggle mechanism, such as a button or a switch, that updates the tempPayload state accordingly.

Here is an example of how you might add a button to toggle the agent mode:

<ItemPanel
  className='mb-4'
  icon={
    <CuteRobote className='w-4 h-4 text-indigo-600' />
  }
  name={t('appDebug.agent.agentMode')}
  description={t('appDebug.agent.agentModeDes')}
>
  <div className='leading-[18px] text-[13px] font-medium text-gray-900'>
    {isFunctionCall ? t('appDebug.agent.agentModeType.functionCall') : t('appDebug.agent.agentModeType.ReACT')}
  </div>
  <Button
    onClick={() => setTempPayload({ ...tempPayload, isFunctionCall: !isFunctionCall })}
  >
    {isFunctionCall ? t('Switch to ReAct') : t('Switch to Function Calling')}
  </Button>
</ItemPanel>

This button will toggle the isFunctionCall state between true and false, effectively changing the agent mode.

The descriptions for the agent modes are as follows:

  • ReAct: "ReAct"
  • Function Calling: "Function Calling"

These descriptions can be found in the web/i18n/zh-Hans/app-debug.ts file under the agentModeType object [1][2][3][4].

To continue talking to Dosu, mention @dosu.

which docker include this file and how to find this file and modify this file?

marco-qu commented 2 months ago

Refere

If I use docker run dify , which image should I modify and where I can find this file?