run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
34.02k stars 4.8k forks source link

[Question]: Does anyone has experience or ideas in handling documents for system operation manuals? #14518

Open thorory opened 3 weeks ago

thorory commented 3 weeks ago

Question Validation

Question

These documents typically contain screenshots from apps or web pages.

dosubot[bot] commented 3 weeks ago

Hey @thorory, I'm here to help you with any questions or issues you have while waiting for a human maintainer. Let me know how I can assist you!

LlamaIndex supports the extraction and indexing of images, such as screenshots, within documents. You can use the ImageReader class from the llama_index.readers.file module to parse text from images using either the DONUT model or pytesseract. This allows you to handle screenshots and other images within your system operation manuals effectively.

Here's a code snippet to set up an ImageReader for handling .jpg, .png, and .jpeg files:

from PIL import Image
from llama_index.readers.file import (
    DEFAULT_FILE_EXTRACTOR,
    ImageReader,
)

@st.cache_resource
def get_file_extractor():
    image_reader = ImageReader(keep_image=True, parse_text=True)
    file_extractor = DEFAULT_FILE_EXTRACTOR
    file_extractor.update(
        {
            ".jpg": image_reader,
            ".png": image_reader,
            ".jpeg": image_reader,
        }
    )

    return file_extractor

file_extractor = get_file_extractor()

Additionally, LlamaIndex provides the ImageDocument and ImageNode classes to handle images within the indexing process. These classes ensure that images can be indexed and queried effectively [1][2].

For integrating and displaying images within indexed documents, you can use the ImageParser and utilities for handling image URLs and encoding images to base64. This can be useful for displaying images in a web application or other interfaces [1][3].

To continue talking to Dosu, mention @dosu.