run-llama / llama_parse

Parse files for optimal RAG
https://www.llamaindex.ai
MIT License
1.75k stars 151 forks source link

It's not extracting images from a textbook excerpt I uploaded / tried with python SDK #236

Open KING-SID opened 2 weeks ago

KING-SID commented 2 weeks ago

😭 does extracting images not really work that well?

# Uncomment if you are in a Jupyter Notebook
import nest_asyncio
nest_asyncio.apply()

from llama_parse import LlamaParse  # pip install llama-parse

parser = LlamaParse(
    api_key="blah-blah-blah",  # can also be set in your env as LLAMA_CLOUD_API_KEY
    result_type="markdown",  # "markdown" and "text" are available
    parsing_instruction="This is a text book which has figures, tables, diagrams, and text. Please extract all the images, text and the tables from the document.",
)

# sync
documents = parser.load_data("./ECON_textbook_100_pages.pdf")

# async
# documents = await parser.aload_data("./my_file.pdf")
hexapode commented 1 week ago

Did you try to look at the result_type="json" results? Images should be there