Closed zhengqianhe0 closed 2 months ago
Yes, the current version 1.0.16 of Langflow supports uploading and parsing images for multimodal capabilities. Additionally, there are plans to add more types of Input and Output like Image and Audio, and to integrate more Components to help build more complex projects.
Supporting evidence:
For handling image uploads and parsing, the specific components or modules in Langflow are found in the src/backend/base/langflow/schema/image.py
file. These include:
Functions:
is_image_file(file_path)
: Verifies if a given file is an image.get_file_paths(files: list[str])
: Retrieves the full paths of the files from the storage service.get_files(file_paths: list[str], convert_to_base64: bool = False)
: Fetches the files from the storage service and optionally converts them to base64.Class:
Image
: A Pydantic model that includes methods for converting an image to base64 (to_base64
), creating a content dictionary (to_content_dict
), and generating a URL for the image (get_url
). [2]I found 2 similar open issues:
To continue talking to Dosu, mention @dosu.
you can use multimodality with the chat input component, uploading images and files in general
Bug Description
I found that all input and output before langflow are text. I would like to know if there is currently a function to upload and parse images? This refers to multimodality, which means directly understanding the content of images or audio/video, rather than extracting text from images and processing it with large models. Does the current version 1.0.16 have this feature? Do you have any plans to integrate multimodal large models?
Reproduction
I can't find how to use components with multimodal capabilities。How can I use this
Expected behavior
Upload JPG image and parse it
Who can help?
No response
Operating System
windows 10
Langflow Version
1.0.16
Python Version
None
Screenshot
No response
Flow File
No response