langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
51.15k stars 7.37k forks source link

Output files from iterasion node gives TypeError. #4873

Closed AndyMik90 closed 4 months ago

AndyMik90 commented 5 months ago

Self Checks

Dify version

0.6.9

Cloud or Self Hosted

Cloud, Self Hosted (Docker), Self Hosted (Source)

Steps to reproduce

  1. When using tools that can ouput files in iteration node it gives TypeError: Object of type FileVar is not JSON serializable.
  2. Also the output from iteration node don't recognize an Array of files. CleanShot 2024-06-02 at 21 30 53 CleanShot 2024-06-02 at 21 36 52

DSL to reproduce:

app:
  description: ''
  icon: "\U0001F916"
  icon_background: '#FFEAD5'
  mode: workflow
  name: Iteration File test
workflow:
  features:
    file_upload:
      image:
        enabled: false
        number_limits: 3
        transfer_methods:
        - local_file
        - remote_url
    opening_statement: ''
    retriever_resource:
      enabled: false
    sensitive_word_avoidance:
      enabled: false
    speech_to_text:
      enabled: false
    suggested_questions: []
    suggested_questions_after_answer:
      enabled: false
    text_to_speech:
      enabled: false
      language: ''
      voice: ''
  graph:
    edges:
    - data:
        isInIteration: false
        sourceType: start
        targetType: code
      id: 1717356703710-source-1717356707855-target
      source: '1717356703710'
      sourceHandle: source
      target: '1717356707855'
      targetHandle: target
      type: custom
      zIndex: 0
    - data:
        isInIteration: false
        sourceType: code
        targetType: iteration
      id: 1717356707855-source-1717356937099-target
      source: '1717356707855'
      sourceHandle: source
      target: '1717356937099'
      targetHandle: target
      type: custom
      zIndex: 0
    - data:
        isInIteration: false
        sourceType: iteration
        targetType: end
      id: 1717356937099-source-1717356952635-target
      source: '1717356937099'
      sourceHandle: source
      target: '1717356952635'
      targetHandle: target
      type: custom
      zIndex: 0
    nodes:
    - data:
        desc: ''
        selected: false
        title: Start
        type: start
        variables:
        - label: image1
          max_length: 48
          options: []
          required: true
          type: text-input
          variable: image1
        - label: image2
          max_length: 48
          options: []
          required: true
          type: text-input
          variable: image2
      height: 116
      id: '1717356703710'
      position:
        x: 80
        y: 282
      positionAbsolute:
        x: 80
        y: 282
      selected: false
      sourcePosition: right
      targetPosition: left
      type: custom
      width: 244
    - data:
        code: "def main(arg1: str, arg2: str) -> dict:\n    return {\n        \"result\"\
          : [arg1, arg2],\n    }"
        code_language: python3
        desc: ''
        outputs:
          result:
            children: null
            type: array[string]
        selected: false
        title: Code
        type: code
        variables:
        - value_selector:
          - '1717356703710'
          - image1
          variable: arg1
        - value_selector:
          - '1717356703710'
          - image2
          variable: arg2
      height: 54
      id: '1717356707855'
      position:
        x: 384
        y: 282
      positionAbsolute:
        x: 384
        y: 282
      selected: false
      sourcePosition: right
      targetPosition: left
      type: custom
      width: 244
    - data:
        desc: ''
        height: 324
        iterator_selector:
        - '1717356707855'
        - result
        output_selector:
        - '1717356940480'
        - files
        output_type: array[string]
        selected: false
        startNodeType: tool
        start_node_id: '1717356940480'
        title: Iteration
        type: iteration
        width: 485
      height: 324
      id: '1717356937099'
      position:
        x: 688
        y: 282
      positionAbsolute:
        x: 688
        y: 282
      selected: false
      sourcePosition: right
      targetPosition: left
      type: custom
      width: 485
      zIndex: 1
    - data:
        desc: ''
        isInIteration: true
        isIterationStart: true
        iteration_id: '1717356937099'
        provider_id: dalle
        provider_name: dalle
        provider_type: builtin
        selected: false
        title: DALL-E 3
        tool_configurations:
          n: 1
          quality: standard
          size: square
          style: vivid
        tool_label: DALL-E 3
        tool_name: dalle3
        tool_parameters:
          prompt:
            type: mixed
            value: '{{#1717356937099.item#}}'
        type: tool
      extent: parent
      height: 168
      id: '1717356940480'
      parentId: '1717356937099'
      position:
        x: 117
        y: 85
      positionAbsolute:
        x: 805
        y: 367
      selected: false
      sourcePosition: right
      targetPosition: left
      type: custom
      width: 244
      zIndex: 1001
    - data:
        desc: ''
        outputs:
        - value_selector:
          - '1717356937099'
          - output
          variable: output
        selected: false
        title: End
        type: end
      height: 90
      id: '1717356952635'
      position:
        x: 1233
        y: 282
      positionAbsolute:
        x: 1233
        y: 282
      selected: true
      sourcePosition: right
      targetPosition: left
      type: custom
      width: 244
    viewport:
      x: 2.365291011755289
      y: 200.59946916062177
      zoom: 0.9125654885434517

✔️ Expected Behavior

Output the files in an array so multiple files can be outputted in the end node.

❌ Actual Behavior

TypeError: Object of type FileVar is not JSON serializable and wrong Array.

dosubot[bot] commented 5 months ago

To resolve the TypeError: Object of type FileVar is not JSON serializable and ensure the iteration node correctly recognizes an array of files in your workflow, you need to use the jsonable_encoder function to encode the outputs before serializing them to JSON. This function is already imported and used in the _get_next_iteration method.

Here's how you can ensure the iteration node correctly recognizes an array of files and handles the serialization:

  1. Ensure that the iterator is a list of files.
  2. Use the jsonable_encoder to encode the outputs before returning them.
  3. Handle special values like FileVar using the handle_special_values method to convert them to a serializable format.

The relevant part of the code that handles the serialization is in the _get_next_iteration method:

def _get_next_iteration(self, variable_pool: VariablePool, state: IterationState) -> NodeRunResult | str:
    """
    Get next iteration start node id based on the graph.
    :param graph: graph
    :return: next node id
    """
    # resolve current output
    self._resolve_current_output(variable_pool, state)
    # move to next iteration
    self._next_iteration(variable_pool, state)

    node_data = cast(IterationNodeData, self.node_data)
    if self._reached_iteration_limit(variable_pool, state):
        return NodeRunResult(
            status: WorkflowNodeExecutionStatus.SUCCEEDED,
            outputs={
                'output': jsonable_encoder(self.handle_special_values(state.outputs))  # Ensure outputs are JSON serializable
            }
        )

    return node_data.start_node_id

By using jsonable_encoder and handle_special_values, you ensure that the state.outputs are properly encoded and can be serialized to JSON, thus avoiding the TypeError. The handle_special_values method converts FileVar objects to a dictionary format that is JSON serializable.

Additionally, ensure that the IterationNodeType correctly handles the iterator_selector and output variables. The getNodeOutputVars function should include the output variable for the Iteration block:

case BlockEnum.Iteration: {
  res.push([id, 'output'])
  break
}

This ensures that the iteration node correctly recognizes and processes the array of files.

In the useConfig hook, ensure that the iterator and output variables are correctly set and handled:

const handleRun = useCallback((data: Record<string, any>) => {
    const formattedData: Record<string, any> = {}
    Object.keys(allVarObject).forEach((key) => {
      const [varSectorStr, nodeId] = key.split(DELIMITER)
      formattedData[`${nodeId}.${allVarObject[key].inSingleRunPassedKey}`] = data[varSectorStr]
    })
    formattedData[iteratorInputKey] = data[iteratorInputKey]
    doHandleRun(formattedData)
  }, [allVarObject, doHandleRun, iteratorInputKey])

const iterator = runInputData[iteratorInputKey]
const setIterator = useCallback((newIterator: string[]) => {
    setRunInputData({
      ...runInputData,
      [iteratorInputKey]: newIterator,
    })
  }, [iteratorInputKey, runInputData, setRunInputData])

This ensures that the iteration node correctly recognizes and processes the array of files [1][2][3][4].

To continue talking to Dosu, mention @dosu.

AndyMik90 commented 5 months ago

Update, also found an error when using the workflow as a tool in an agent; the output ArrayFiles gives this error.

ERROR:root:Instance <MessageFile at 0x7ff53fcf11b0> is not bound to a Session; attribute refresh operation cannot proceed (Background on this error at: https://sqlalche.me/e/20/bhk3)

dosubot[bot] commented 5 months ago

The error Instance <MessageFile at 0x7ff53fcf11b0> is not bound to a Session typically occurs when an instance of a SQLAlchemy model is accessed outside of an active session. Here are steps to resolve this issue:

  1. Ensure Session Binding: Make sure that the MessageFile instance is bound to an active session when performing operations. You can reattach the instance to a session if needed.

    session.add(message_file_instance)
    session.commit()
  2. Use Scoped Sessions: Use SQLAlchemy's scoped_session to ensure that the session is properly managed and instances remain bound to the session.

    from sqlalchemy.orm import scoped_session, sessionmaker
    
    session_factory = sessionmaker(bind=engine)
    Session = scoped_session(session_factory)
    session = Session()
  3. Eager Loading: Use eager loading to avoid lazy loading issues.

    message_files = session.query(MessageFile).options(joinedload(MessageFile.some_relationship)).all()
  4. Check Session Lifecycle: Ensure that the session is active and not closed when performing operations on the instance.

    if not session.is_active:
        session = Session()

In the context of the MessageFileParser class, ensure that the session is properly managed when querying or manipulating MessageFile objects. For example, when validating and transforming files, the session should be active and properly scoped to avoid detached instances:

upload_file = (db.session.query(UploadFile)
               .filter(
                   UploadFile.id == file_obj.related_id,
                   UploadFile.tenant_id == self.tenant_id,
                   UploadFile.created_by == user.id,
                   UploadFile.created_by_role == ('account' if isinstance(user, Account) else 'end_user'),
                   UploadFile.extension.in_(IMAGE_EXTENSIONS)
               ).first())

if not upload_file:
    raise ValueError('Invalid upload file')

By following these practices, you can avoid the "Instance is not bound to a Session" error in the context of the MessageFile class [1][2].

To continue talking to Dosu, mention @dosu.

AndyMik90 commented 5 months ago

If you need any additional information @takatost just reach out to me. It's a bit urgent for us to fix this as we have a new agent needing to produce images with a workflow tool.