Open martinb-ai opened 6 days ago
Hey @martinb-ai! 👋 I'm here to help you with any bugs, questions, or contributions you have while you wait for a human maintainer. Let's tackle this together!
I found a similar unsolved discussion regarding parsing multimedia content of PowerPoint files [1]. Additionally, LlamaIndex supports integration with multimodal models for processing images and graphs through its integration with various multimodal language models, including those from Hugging Face, Ollama, and OpenAI. This integration enables advanced multimodal capabilities such as image captioning and visual question answering [2].
For handling PowerPoints with images and graphs, you can use the PptxReader
class in LlamaIndex, which extracts text and captions images from .pptx
files. It uses a pre-trained model to generate captions for images, which involves temporarily saving the image and processing it with a vision encoder-decoder model. However, for other types of multimedia content like graphs, additional handling and models would be required, which are not currently implemented in the provided code [3].
If you are looking for a more efficient way, you might consider creating separate indexes for each type of document or content, as each modality requires a specific embedding model. This approach ensures optimal performance when searching over a knowledge base with documents requiring different kinds of embeddings, including multimodal ones [4].
To continue talking to Dosu, mention @dosu.
Have you seen this example? https://github.com/run-llama/llama_parse/blob/main/examples/multimodal/multimodal_rag_slide_deck.ipynb
@logan-markewich Hi there! I read through it but I am curious... Is there a way to not have to use Llama-parse?
Question Validation
Question
HI there!
I am curious on how to handle PowerPoints that contain images and graphs besides texts. What is the most efficient way to embed the information.
I know we can use a multimodal model that can analyze the images and provide descriptions of what it sees, but is there a better more efficient way to do this in llama-index?
Or is it better to convert the pptx into a pdf and ingest it that way? Do PDF images get handled better?
Thanks!