Closed mw19930312 closed 8 months ago
🤖
Hey there @mw19930312! 🦙 Long time no chat. Hope you're doing fantastic and still exploring the endless possibilities with images. What's cooking in the tech kitchen today?
Based on the context provided, it seems that you can indeed create an ImageDocument directly from a base64 encoded string in LlamaIndex. This is possible through the ImageNode
class, which has an attribute image
that accepts a base64 encoded image string. The resolve_image
method of the ImageNode
class then decodes this base64 string into bytes that can be read by PIL (Python Imaging Library).
Here is the relevant code snippet from the LlamaIndex repository:
class ImageNode(TextNode):
"""Node with image."""
# base64 encoded image str
image: Optional[str] = None
def resolve_image(self) -> ImageType:
"""Resolve an image such that PIL can read it."""
if self.image is not None:
import base64
return BytesIO(base64.b64decode(self.image))
You can find this in the llama_index/schema.py file.
However, there is currently no direct support for creating an ImageDocument from a binary string. You would need to convert the binary string to a base64 encoded string before creating the ImageDocument.
As for your question about the complete
function in OpenAIMultiModal, I'm afraid I don't have enough information to provide a definitive answer. Could you please provide more details about the function and how you're planning to use it? This would help me give a more accurate response.
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
Assuming you get the base64 as a string
from llama_index.schema import ImageDocument
document = ImageDocument(text="optional", image=b64_string)
from llama_index.multi_modal_llms.openai import OpenAIMultiModal
llm = OpenAIMultiModal(
model="gpt-4-vision-preview", api_key=OPENAI_API_TOKEN, max_new_tokens=300
)
response = llm.complete("Tell me about this image", image_documents=[document])
Question Validation
Question
I'm trying to use OpenAIMultiModal class to process a few images that I found in a PDF. The image data are stored as a tentative variable either in a binary string or a base64 encoded string. However, from the official document, it seems that we need to read images from a url or a local file to create an ImageDocument that can be processed by OpenAIMultiModal.
I cannot save the image to disk and read it back due to disk space concerns. Is there any way to directly create an ImageDocument from a binary string? Or is it possible to call the
complete
function in OpenAIMultiModal with an image string? Currently I'm using the following code as an alternative solution using vanilla OpenAI API call. How can I convert it into llama_index wrapper?