run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.78k stars 5.27k forks source link

[Bug]: ImportError: `llama-index-readers-file` package not found #12045

Closed trishachander closed 8 months ago

trishachander commented 8 months ago

Bug Description

This issue came up only last week after the llama-index version updates. I am trying to do the following:

from llama_index.core import VectorStoreIndex,SimpleDirectoryReader filename_fn = lambda filename: {"file_name": filename} reader = SimpleDirectoryReader(input_dir="./data",file_metadata=filename_fn,filename_as_id=True) docs = reader.load_data()

I've tried running it with venv, uninstalling and reinstalling, and if I use llama_index.legacy.readers.file.base to import SimpleDirectoryReader, there's an error that occurs when I try to run the following:

index = VectorStoreIndex.from_documents(documents=docs)

Following are the versions installed: llama-index 0.10.20 llama-index-agent-openai 0.1.5 llama-index-cli 0.1.9 llama-index-core 0.10.20.post2 llama-index-embeddings-openai 0.1.6 llama-index-indices-managed-llama-cloud 0.1.4 llama-index-legacy 0.9.48 llama-index-llms-openai 0.1.12 llama-index-multi-modal-llms-openai 0.1.4 llama-index-program-openai 0.1.4 llama-index-question-gen-openai 0.1.3 llama-index-readers-file 0.1.11 llama-index-readers-llama-parse 0.1.3 llama-index-vector-stores-chroma 0.1.5 llama-index-vector-stores-qdrant 0.1.4 llama-parse 0.3.9 llamaindex-py-client 0.1.13

Version

0.10.20

Steps to Reproduce

from llama_index.core import VectorStoreIndex,SimpleDirectoryReader filename_fn = lambda filename: {"file_name": filename} reader = SimpleDirectoryReader(input_dir="./data",file_metadata=filename_fn,filename_as_id=True) docs = reader.load_data()

Relevant Logs/Tracbacks

ImportError                               Traceback (most recent call last)
File \AppData\Local\Programs\Python\Python311\Lib\site-packages\llama_index\core\readers\file\base.py:23, in _try_loading_included_file_formats()
     22 try:
---> 23     from llama_index.readers.file import (
     24         DocxReader,
     25         EpubReader,
     26         HWPReader,
     27         ImageReader,
     28         IPYNBReader,
     29         MarkdownReader,
     30         MboxReader,
     31         PandasCSVReader,
     32         PDFReader,
     33         PptxReader,
     34         VideoAudioReader,
     35     )  # pants: no-infer-dep
     36 except ImportError:

ImportError: cannot import name 'DocxReader' from 'llama_index.readers.file' (unknown location)

During handling of the above exception, another exception occurred:

ImportError                               Traceback (most recent call last)
Cell In[28], line 1
...
     55     ".ipynb": IPYNBReader,
     56 }
     57 return default_file_reader_cls

ImportError: `llama-index-readers-file` package not found
dosubot[bot] commented 8 months ago

Hey there, @trishachander! Great to see you back. Running into a bit of a hiccup, I see? Let's dive into it.

Based on the error logs and traceback you've provided, it seems like the llama-index-readers-file package is not being recognized by your Python environment. This could be due to a variety of reasons, including issues with your Python environment, incorrect installation, or changes in the LlamaIndex package structure.

Here are a few steps you can take to troubleshoot this issue:

  1. Ensure the llama-index-readers-file package is correctly installed: You can do this by running pip show llama-index-readers-file in your terminal. If the package is installed, this command will display information about it. If not, you'll need to install it using pip install llama-index-readers-file.

  2. Create a new virtual environment: Sometimes, Python packages can conflict with each other, causing issues like the one you're experiencing. Creating a new virtual environment can help isolate your project and its dependencies, preventing such conflicts. Here's how you can do it:

python -m venv new_env
source new_env/bin/activate
pip install llama-index
  1. Check your import statement: The error message suggests that there might be an issue with the import statement. The SimpleDirectoryReader class is located in the llama_index.legacy.readers.file.base module, not llama_index.readers.file.base. Here's the correct import statement:
from llama_index.legacy.readers.file.base import SimpleDirectoryReader
  1. Reinstall LlamaIndex without using cache: If the issue persists, try reinstalling LlamaIndex without using cache. This can help if the issue is caused by a corrupted cache:
pip uninstall llama-index
pip install llama-index --no-cache-dir

If none of these solutions work, it would be helpful to have more information about your Python environment. Could you provide the output of pip freeze? This will list all the Python packages installed in your environment, which could help identify any potential conflicts or issues.

For more information, you can refer to these similar issues in the LlamaIndex repository:

I hope this helps! Let me know if you have any other questions or if there's anything else I can do to assist you.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
trishachander commented 8 months ago

This is the output of my pip freeze. I tried uninstalling llama-index and installing another version to see if it works, however I'm still running into the same problem.

Below is the pip freeze:

aiohttp==3.9.3 aiosignal==1.3.1 alembic==1.13.1 altair==5.2.0 annotated-types==0.6.0 antlr4-python3-runtime==4.9.3 anyio==4.3.0 argilla==1.25.0 asgiref==3.7.2 asttokens==2.4.1 attrs==23.2.0 backoff==2.2.1 bcrypt==4.1.2 beautifulsoup4==4.12.3 blinker==1.7.0 blis==0.7.11 boto3==1.34.59 botocore==1.34.59 bs4==0.0.2 build==1.0.3 CacheControl==0.14.0 cachetools==5.3.3 catalogue==2.0.10 certifi==2024.2.2 cffi==1.16.0 chardet==5.2.0 charset-normalizer==3.3.2 chroma-hnswlib==0.7.3 chromadb==0.4.24 cleo==2.1.0 click==8.1.7 cloudpathlib==0.16.0 cloudpickle==2.2.1 colorama==0.4.6 coloredlogs==15.0.1 comm==0.2.1 confection==0.1.4 contextlib2==21.6.0 contourpy==1.2.0 crashtest==0.4.1 cryptography==42.0.2 cycler==0.12.1 cymem==2.0.8 dataclasses-json==0.6.4 dataclasses-json-speakeasy==0.5.11 debugpy==1.8.1 decorator==5.1.1 Deprecated==1.2.14 dill==0.3.8 dirtyjson==1.0.8 distlib==0.3.8 distro==1.9.0 docker==7.0.0 dulwich==0.21.7 effdet==0.4.1 emoji==2.10.1 entrypoints==0.4 et-xmlfile==1.1.0 executing==2.0.1 Faker==24.2.0 fastapi==0.110.0 fastjsonschema==2.19.1 favicon==0.7.0 filelock==3.13.1 filetype==1.2.0 Flask==3.0.2 flatbuffers==23.5.26 fonttools==4.49.0 frozendict==2.4.0 frozenlist==1.4.1 fsspec==2024.3.0 gitdb==4.0.11 GitPython==3.1.42 google-auth==2.28.1 google-pasta==0.2.0 googleapis-common-protos==1.62.0 greenlet==3.0.3 grpcio==1.62.0 grpcio-tools==1.62.0 h11==0.14.0 h2==4.1.0 hpack==4.0.0 htbuilder==0.6.2 html2text==2024.2.26 httpcore==1.0.4 httptools==0.6.1 httpx==0.27.0 huggingface-hub==0.20.3 humanfriendly==10.0 humanize==4.9.0 hyperframe==6.0.1 idna==3.6 importlib-metadata==7.0.1 importlib-resources==6.1.1 installer==0.7.0 iopath==0.1.10 ipykernel==6.29.3 ipython==8.22.2 itsdangerous==2.1.2 jaraco.classes==3.3.1 jedi==0.19.1 Jinja2==3.1.3 jmespath==1.0.1 joblib==1.3.2 jsonpatch==1.33 jsonpath-python==1.0.6 jsonpointer==2.4 jsonschema==4.21.1 jsonschema-specifications==2023.12.1 jupyter_client==8.6.0 jupyter_core==5.7.1 keyring==24.3.1 kiwisolver==1.4.5 kubernetes==29.0.0 langchain==0.1.12 langchain-community==0.0.28 langchain-core==0.1.32 langchain-openai==0.0.8 langchain-text-splitters==0.0.1 langcodes==3.3.0 langdetect==1.0.9 langsmith==0.1.27 layoutparser==0.3.4 llama-hub==0.0.79.post1 llama-index==0.10.16 llama-index-agent-openai==0.1.5 llama-index-cli==0.1.9 llama-index-core==0.10.20.post2 llama-index-embeddings-openai==0.1.6 llama-index-indices-managed-llama-cloud==0.1.4 llama-index-legacy==0.9.48 llama-index-llms-openai==0.1.12 llama-index-multi-modal-llms-openai==0.1.4 llama-index-program-openai==0.1.4 llama-index-question-gen-openai==0.1.3 llama-index-readers-file==0.1.11 llama-index-readers-llama-parse==0.1.3 llama-index-vector-stores-chroma==0.1.5 llama-index-vector-stores-qdrant==0.1.4 llama-parse==0.3.9 llamaindex-py-client==0.1.13 lxml==5.1.0 Mako==1.3.2 Markdown==3.5.2 markdown-it-py==3.0.0 markdownlit==0.0.7 MarkupSafe==2.1.5 marshmallow==3.21.1 matplotlib==3.7.2 matplotlib-inline==0.1.6 mdurl==0.1.2 merkle-json==1.0.0 millify==0.1.1 mmh3==4.1.0 monotonic==1.6 more-itertools==10.2.0 mpmath==1.3.0 msg-parser==1.2.0 msgpack==1.0.7 multidict==6.0.5 multiprocess==0.70.16 munch==4.0.0 murmurhash==1.0.10 mypy-extensions==1.0.0 nest-asyncio==1.6.0 networkx==3.2.1 nltk==3.8.1 numpy==1.26.4 oauthlib==3.2.2 olefile==0.47 omegaconf==2.3.0 onnx==1.15.0 onnxruntime==1.15.1 openai==1.14.1 opencv-python==4.8.0.76 openpyxl==3.1.2 opentelemetry-api==1.23.0 opentelemetry-exporter-otlp-proto-common==1.23.0 opentelemetry-exporter-otlp-proto-grpc==1.23.0 opentelemetry-instrumentation==0.44b0 opentelemetry-instrumentation-asgi==0.44b0 opentelemetry-instrumentation-fastapi==0.44b0 opentelemetry-proto==1.23.0 opentelemetry-sdk==1.23.0 opentelemetry-semantic-conventions==0.44b0 opentelemetry-util-http==0.44b0 orjson==3.9.15 overrides==7.7.0 packaging==23.2 pandas==2.2.1 parso==0.8.3 pathos==0.3.2 pdf2image==1.17.0 pdfminer.six==20221105 pdfplumber==0.10.4 pexpect==4.9.0 pikepdf==8.11.0 pillow==10.2.0 pillow_heif==0.15.0 pkginfo==1.9.6 platformdirs==4.2.0 poetry==1.8.1 poetry-core==1.9.0 poetry-plugin-export==1.6.0 portalocker==2.8.2 posthog==3.5.0 pox==0.3.4 ppft==1.7.6.8 preshed==3.0.9 prometheus_client==0.20.0 prompt-toolkit==3.0.43 protobuf==4.25.3 psutil==5.9.8 ptyprocess==0.7.0 pulsar-client==3.4.0 pure-eval==0.2.2 pyaml==23.12.0 pyarrow==15.0.1 pyasn1==0.5.1 pyasn1-modules==0.3.0 pycocotools==2.0.7 pycparser==2.21 pydantic==2.6.4 pydantic_core==2.16.3 pydeck==0.8.1b0 Pygments==2.17.2 pymdown-extensions==10.7.1 PyMuPDF==1.23.26 PyMuPDFb==1.23.22 pypandoc==1.12 pyparsing==3.0.9 pypdf==4.1.0 pypdfium2==4.27.0 PyPika==0.48.9 pyproject_hooks==1.0.0 pyreadline3==3.4.1 pytesseract==0.3.10 python-dateutil==2.9.0.post0 python-decouple==3.8 python-docx==1.1.0 python-dotenv==1.0.1 python-iso639==2024.2.7 python-magic==0.4.27 python-multipart==0.0.9 python-pptx==0.6.23 pytz==2024.1 pywin32==306 pywin32-ctypes==0.2.2 PyYAML==6.0.1 pyzmq==25.1.2 qdrant-client==1.8.0 rapidfuzz==3.6.1 referencing==0.34.0 regex==2023.12.25 requests==2.31.0 requests-oauthlib==1.3.1 requests-toolbelt==1.0.0 retrying==1.3.4 rich==13.7.1 rpds-py==0.18.0 rsa==4.9 s3transfer==0.10.0 safetensors==0.3.2 sagemaker==2.212.0 schema==0.7.5 scipy==1.10.1 shellingham==1.5.4 six==1.16.0 slack-bolt==1.18.1 slack_sdk==3.27.1 smart-open==6.4.0 smdebug-rulesconfig==1.0.1 smmap==5.0.1 sniffio==1.3.1 soupsieve==2.5 spacy==3.7.4 spacy-legacy==3.0.12 spacy-loggers==1.0.5 SQLAlchemy==2.0.28 srsly==2.4.8 st-annotated-text==4.0.1 stack-data==0.6.3 starlette==0.36.3 streamlit==1.32.2 streamlit-aggrid==0.3.4.post3 streamlit-camera-input-live==0.2.0 streamlit-card==1.0.0 streamlit-embedcode==0.1.2 streamlit-extras==0.4.0 streamlit-faker==0.0.3 streamlit-image-coordinates==0.1.6 streamlit-keyup==0.2.3 streamlit-pills==0.3.0 streamlit-toggle-switch==1.0.2 streamlit-vertical-slider==2.5.5 striprtf==0.0.26 sympy==1.12 tabulate==0.9.0 tblib==2.0.0 tenacity==8.2.3 thinc==8.2.3 tiktoken==0.6.0 timm==0.9.12 tokenizers==0.15.2 toml==0.10.2 tomlkit==0.12.4 toolz==0.12.1 torch==2.2.0 torchvision==0.17.0 tornado==6.4 tqdm==4.66.2 traitlets==5.14.1 transformers==4.37.1 trove-classifiers==2024.2.23 trulens-eval==0.26.0 typer==0.9.0 typing-inspect==0.9.0 typing_extensions==4.10.0 tzdata==2024.1 tzlocal==5.2 unstructured==0.12.6 unstructured-client==0.18.0 unstructured-inference==0.7.23 unstructured.pytesseract==0.3.12 urllib3==2.2.1 uvicorn==0.27.1 validators==0.22.0 virtualenv==20.25.1 wasabi==1.1.2 watchdog==4.0.0 watchfiles==0.21.0 wcwidth==0.2.13 weasel==0.3.4 websocket-client==1.7.0 websockets==12.0 Werkzeug==3.0.1 wget==3.2 wrapt==1.16.0 xlrd==2.0.1 XlsxWriter==3.1.9 yarl==1.9.4 zipp==3.17.0