langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
46.46k stars 6.56k forks source link

Using Vision model "minicpm-v:8b-2.6-q4_K_M" with ollama, chat message raised error message #8777

Open HorseLuke opened 2 hours ago

HorseLuke commented 2 hours ago

Self Checks

Dify version

0.8.3

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

  1. Deploy ollama 0.3.12
  2. Pull Vision model "minicpm-v:8b-2.6-q4_K_M"
    ollama run minicpm-v:8b-2.6-q4_K_M
  3. Go to dify 0.8.3 -> add new provider, add this vision model with ollama definition.
  4. Create blank app using this vision model.
  5. In "Studio" "Orchestrate" page, change vision setting -> Resolution to Low.
  6. Upload image, then chat in "Debug and Preview"

✔️ Expected Behavior

Chat will output normal from start to end.

❌ Actual Behavior

Chat output normal at first time, but then stop and clear output text, then it raised error at the top right of the page, saying:

Error: Input [TextPromptMessageContent(type=<PromptMessageContentType.TEXT: 'text'>, data='图片表述什么内容?'), ImagePromptMessageContent(type=<PromptMessageContentType.IMAGE: 'image'>, data='data:image/jpeg;base64,/9j/4AAQSkZJRgA...[ignore]...', detail=<DETAIL.LOW: 'low'>)] is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers.

Attach is from docker logs after setting DEBUG=true

dify-ollama-error-minicpm-v.zip

dosubot[bot] commented 2 hours ago

Hey @HorseLuke! 👋 I'm here to help you with bug fixes, answer your questions, and guide you to become a contributor. Let's squash those bugs together!

I found two similar discussions that might be relevant to your issue:

  1. Adding a local ollama model error which was last updated on 2024-09-12.
  2. Adding a local ollama model error which was last updated on 2024-09-12.

To resolve the error "Input [TextPromptMessageContent(type=<PromptMessageContentType.TEXT: 'text'>, data='图片表述什么内容?'), ImagePromptMessageContent(type=<PromptMessageContentType.IMAGE: 'image'>, data='data:image/jpeg;base64,/9j/4AAQSkZJRgA...[ignore]...', detail=<DETAIL.LOW: 'low'>)] is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers" when using the Vision model "minicpm-v:8b-2.6-q4_K_M" with ollama in dify 0.8.3, you need to ensure that the prompt_messages are correctly formatted.

The _generate method in the OllamaLargeLanguageModel class expects the prompt_messages to be converted into a dictionary format that the API can understand. Specifically, the content of UserPromptMessage should be a string or a list/tuple of strings or integers.

Here is the relevant part of the code that handles this conversion:

def _convert_prompt_message_to_dict(self, message: PromptMessage) -> dict:
    """
    Convert PromptMessage to dict for Ollama API
    """
    if isinstance(message, UserPromptMessage):
        message = cast(UserPromptMessage, message)
        if isinstance(message.content, str):
            message_dict = {"role": "user", "content": message.content}
        else:
            text = ""
            images = []
            for message_content in message.content:
                if message_content.type == PromptMessageContentType.TEXT:
                    message_content = cast(TextPromptMessageContent, message_content)
                    text = message_content.data
                elif message_content.type == PromptMessageContentType.IMAGE:
                    message_content = cast(ImagePromptMessageContent, message_content)
                    image_data = re.sub(r"^data:image\/[a-zA-Z]+;base64,", "", message_content.data)
                    images.append(image_data)
            message_dict = {"role": "user", "content": text, "images": images}
    elif isinstance(message, AssistantPromptMessage):
        message = cast(AssistantPromptMessage, message)
        message_dict = {"role": "assistant", "content": message.content}
    elif isinstance(message, SystemPromptMessage):
        message = cast(SystemPromptMessage, message)
        message_dict = {"role": "system", "content": message.content}
    else:
        raise ValueError(f"Got unknown type {message}")

    return message_dict

Ensure that your prompt_messages are structured correctly before invoking the model. For example:

prompt_messages = [
    UserPromptMessage(content=[
        TextPromptMessageContent(type=PromptMessageContentType.TEXT, data='图片表述什么内容?'),
        ImagePromptMessageContent(type=PromptMessageContentType.IMAGE, data='data:image/jpeg;base64,/9j/4AAQSkZJRgA...')
    ])
]

This will be converted to:

{
    "role": "user",
    "content": "图片表述什么内容?",
    "images": ["base64_encoded_image_data"]
}

Make sure that the content field is a string or a list/tuple of strings or integers, and the images field contains the base64 encoded image data [1][2][3].

To continue talking to Dosu, mention @dosu.

HorseLuke commented 2 hours ago

@dosu Based on your context, I can not find anything that can change content field in "Orchestrate" page.

However, if I add a sentense in "Instructions" textarea, then the error disappear. "Instructions" textarea is described as these text:

Prompts guide AI responses with instructions and constraints. Insert variables like {{input}}. This prompt won't be visible to users.

What does "Instructions" textarea used in content field? Is this a bug when "Instructions" textarea is empty?

dosubot[bot] commented 2 hours ago

The "Instructions" textarea in the content field is used to input and display the opening statement. It is not a bug when the "Instructions" textarea is empty. The code handles both cases where the textarea has content and where it is empty, providing appropriate placeholders and UI elements for each scenario.

Here is the relevant code snippet that demonstrates this:

const value = openingStatement?.opening_statement || '';
const hasValue = !!(value || '').trim();
...
<div className='text-gray-700 text-sm'>
  {(hasValue || (!hasValue && isFocus)) ? (
    <>
      {isFocus
        ? (
          <div>
            <textarea
              ref={inputRef}
              value={tempValue}
              rows={3}
              onChange={e => setTempValue(e.target.value)}
              className="w-full px-0 text-sm  border-0 bg-transparent focus:outline-none "
              placeholder={t('appDebug.openingStatement.placeholder') as string}
            >
            </textarea>
          </div>
        )
        : (
          <div dangerouslySetInnerHTML={{
            __html: coloredContent,
          }}></div>
        )}
      {renderQuestions()}
    </>) : (
    <div className='pt-2 pb-1 text-xs text-gray-500'>{t('appDebug.openingStatement.noDataPlaceHolder')}</div>
  )}
</div>

The code checks if there is a value in the opening_statement and handles the UI accordingly, showing a placeholder when the textarea is empty [1].

Additionally, in the SegmentDetailComponent from the Dify platform, the AutoHeightTextarea component is used to handle text input fields, including scenarios where the content might be empty. This further confirms that the application is designed to handle empty textareas gracefully [2].

To continue talking to Dosu, mention @dosu.