Closed KwanFaiLau closed 1 month ago
@KwanFaiLau Hello there! I'm here to help you with bugs, questions, and becoming a contributor. Let me know how I can assist you while you wait for a human maintainer.
To enable the LLM to recognize images embedded in PDF documents using the vision model in Dify version 0.6.12fix1, self-hosted with Docker, you need to ensure that the vision
property within the LLMNodeData
class is set to have enabled: true
. Here is the relevant code snippet from the api/core/workflow/nodes/llm/entities.py
file:
from typing import Any, Literal, Optional, Union
from pydantic import BaseModel
from core.prompt.entities.advanced_prompt_entities import ChatModelMessage, CompletionModelPromptTemplate, MemoryConfig
from core.workflow.entities.base_node_data_entities import BaseNodeData
from core.workflow.entities.variable_entities import VariableSelector
class ModelConfig(BaseModel):
"""
Model Config.
"""
provider: str
name: str
mode: str
completion_params: dict[str, Any] = {}
class ContextConfig(BaseModel):
"""
Context Config.
"""
enabled: bool
variable_selector: Optional[list[str]] = None
class VisionConfig(BaseModel):
"""
Vision Config.
"""
class Configs(BaseModel):
"""
Configs.
"""
detail: Literal['low', 'high']
enabled: bool
configs: Optional[Configs] = None
class PromptConfig(BaseModel):
"""
Prompt Config.
"""
jinja2_variables: Optional[list[VariableSelector]] = None
class LLMNodeChatModelMessage(ChatModelMessage):
"""
LLM Node Chat Model Message.
"""
jinja2_text: Optional[str] = None
class LLMNodeCompletionModelPromptTemplate(CompletionModelPromptTemplate):
"""
LLM Node Chat Model Prompt Template.
"""
jinja2_text: Optional[str] = None
class LLMNodeData(BaseNodeData):
"""
LLM Node Data.
"""
model: ModelConfig
prompt_template: Union[list[LLMNodeChatModelMessage], LLMNodeCompletionModelPromptTemplate]
prompt_config: Optional[PromptConfig] = None
memory: Optional[MemoryConfig] = None
context: ContextConfig
vision: VisionConfig
Make sure that the vision
object has enabled
set to true
:
vision: {
enabled: true,
configs: {
detail: 'high' # or 'low' based on your requirement
}
}
This configuration will enable the LLM to recognize images generated within the workflow [1].
To continue talking to Dosu, mention @dosu.
where is the vision object ,can you tell me the file path
The vision
object is part of the LLMNodeData
class in the api/core/workflow/nodes/llm/entities.py
file. You can find it within the LLMNodeData
class definition. Here is the relevant section of the file path:
api/core/workflow/nodes/llm/entities.py
Look for the LLMNodeData
class in this file, and you will see the vision
property defined there.
To continue talking to Dosu, mention @dosu.
when i modify
class VisionConfig(BaseModel):
"""
Vision Config.
"""
class Configs(BaseModel):
"""
Configs.
"""
detail: Literal['low', 'high']
enabled: bool
configs: Optional[Configs] = None
as
class VisionConfig(BaseModel):
"""
Vision Config.
"""
class Configs(BaseModel):
"""
Configs.
"""
detail: Literal['low', 'high']
enabled: true
configs: Optional[Configs] = None
the docker "docker-api-1" can not start .
Self Checks
Dify version
0.6.12fix1
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
pdf documents might have images embeded inside , it cann't parsing the image via vision model to extract to text format and index into vector database
✔️ Expected Behavior
No response
❌ Actual Behavior
No response