[Bug]: ImageDocument can not work with local images

run-llama / llama_index

LlamaIndex is a data framework for your LLM applications

https://docs.llamaindex.ai

MIT License

36.44k stars 5.2k forks source link

[Bug]: ImageDocument can not work with local images #16524

Closed whisper-bye closed 2 weeks ago

whisper-bye commented 2 weeks ago

Bug Description

According to the documentation, https://docs.llamaindex.ai/en/stable/examples/multi_modal/dashscope_multi_modal/#use-images-from-local-files

The image is read incorrectly when constructing an ImageDocument using load_local_images because of the following

https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/multi_modal_llms/llama-index-multi-modal-llms- dashscope/llama_index/multi_modal_llms/dashscope/base.py#L152

Always use image_document.image_url

Version

latest

Steps to Reproduce

as same as the document

Relevant Logs/Tracbacks

No response

logan-markewich commented 2 weeks ago

Seems like a PR is needed to load images from local (unless dashscope doesn't allow that)

dosubot[bot] commented 2 weeks ago

Hello @whisper-bye! I'm here to assist you with any bugs, questions, or contributions. Let's work together to solve the issue you're facing.

To address the issue with ImageDocument not working with local images, ensure that the image_path is correctly set when using load_local_images. The function should create ImageDocument instances with the image_path set to the paths provided. Verify that the paths are correct and accessible.

The image_url property is used to specify the URL of an image. If image_url is None, the image_path is used instead. This is managed in the code by checking if image_url is available; if not, it falls back to using image_path for local images ^[1].

Make sure your local image paths are correctly formatted and accessible as per the documentation. If the issue persists, check the implementation of the ImageDocument class to ensure it handles local paths properly ^[1].

To continue talking to Dosu, mention @dosu.

whisper-bye commented 2 weeks ago

it would be better if ImageDocument support directly constructed from base64 or PIL.Image

whisper-bye commented 2 weeks ago

After reviewing some documentation, this problem can be solved with the following methods. https://help.aliyun.com/zh/model-studio/user-guide/vision#647c6397db430

def chat_with_image(query_text: str, image: Image) -> Generator[CompletionResponse, None, None]:
    image_file = BytesIO()
    image.save(image_file, format="JPEG")
    image_base64 = base64.b64encode(image_file.getvalue()).decode("ascii")

    image_urls = [
        f"data:image/jpeg;base64,{image_base64}"
    ]

    image_documents = load_image_urls(image_urls)

    dashscope_multi_modal_llm = DashScopeMultiModal(
        model_name=DashScopeMultiModalModels.QWEN_VL_MAX,
    )

    return dashscope_multi_modal_llm.stream_complete(
        prompt=query_text,
        image_documents=image_documents,
    )