run-llama / llama-hub

A library of data loaders for LLMs made by the community -- to be used with LlamaIndex and/or LangChain
https://llamahub.ai/
MIT License
3.44k stars 731 forks source link

[Bug]: Issue while running UnstructuredReader together with SimpleDirectory Reader #856

Open timtensor opened 7 months ago

timtensor commented 7 months ago

Bug Description

I am using the following code to install the data loader in google colab enviroment

from pathlib import Path
from llama_index import download_loader
from llama_index import SimpleDirectoryReader

UnstructuredReader = download_loader('UnstructuredReader')

dir_reader = SimpleDirectoryReader('./Data', file_extractor={
  ".pdf": UnstructuredReader(),
  ".html": UnstructuredReader(),
  ".eml": UnstructuredReader(),
})
documents = dir_reader.load_data()

However I keep running into the issue of ImportError: partition_pdf is not available.

Version

0.9.29

Steps to Reproduce

Follow the description in colab enviroment

Relevant Logs/Tracbacks

No response

logan-markewich commented 7 months ago

I think you need to pip install "unstructured[pdf]"

The reqs for this loader should maybe be updated