run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
37.06k stars 5.32k forks source link

[Bug]: Can not install pandas properly. #16286

Closed drunkwcodes closed 2 months ago

drunkwcodes commented 2 months ago

Bug Description

I was trying to do a simple markdown RAG:

from llama_index.core import SimpleDirectoryReader

from llama_index.readers.file.markdown import MarkdownReader

parser = MarkdownReader()
file_extractor = {".md": parser}
documents = SimpleDirectoryReader(
    input_files= ["/home/drunkwcodes/projects/myfoam/kb/package_manager/pdm.md"], file_extractor=file_extractor
).load_data()
# documents = SimpleDirectoryReader(input_files= ["/home/drunkwcodes/projects/myfoam/kb/package_manager/pdm.md"]).load_data()

But the result is still in vain:

ModuleNotFoundError: No module named 'pandas.compat'

The c extensions is not built by installing from pypi.

Please purge the pandas dependency for the majority of people who want plain text reading.

Version

0.11.14

Steps to Reproduce

pdm add these:

"llama-index>=0.11.14", "llama-index-embeddings-huggingface>=0.3.1", "llama-index-readers-pdf-marker>=0.2.0", "llama-parse>=0.5.6", "nest-asyncio>=1.6.0", "marker-pdf>=0.2.17", "llama-index-readers-file>=0.2.2", "pandas[excel]>=2.2.3",

and run this script:

from llama_index.core import SimpleDirectoryReader

from llama_index.readers.file.markdown import MarkdownReader

parser = MarkdownReader()
file_extractor = {".md": parser}
documents = SimpleDirectoryReader(
    input_files= ["/home/drunkwcodes/projects/myfoam/kb/package_manager/pdm.md"], file_extractor=file_extractor
).load_data()
# documents = SimpleDirectoryReader(input_files= ["/home/drunkwcodes/projects/myfoam/kb/package_manager/pdm.md"]).load_data()

Relevant Logs/Tracbacks

No response

dosubot[bot] commented 2 months ago

Hey @drunkwcodes! I'm here to help you with any bugs, questions, or contributions you have. Let's tackle this issue together!

To resolve the "ModuleNotFoundError: No module named 'pandas.compat'" error when using SimpleDirectoryReader with MarkdownReader, you can configure SimpleDirectoryReader to exclude readers that depend on pandas. Here's how you can do it:

from llama_index.core.readers.file.base import SimpleDirectoryReader
from llama_index.readers.file.markdown import MarkdownReader

# Define a custom file_extractor that excludes Pandas readers
custom_file_extractor = {
    ".md": MarkdownReader,
    # Add other readers as needed, excluding those that require pandas
}

# Initialize SimpleDirectoryReader with the custom file_extractor
reader = SimpleDirectoryReader(
    input_files=["/home/drunkwcodes/projects/myfoam/kb/package_manager/pdm.md"],
    file_extractor=custom_file_extractor
)

# Use the reader to read files
documents = reader.load_data()

This configuration ensures that SimpleDirectoryReader uses MarkdownReader for .md files without requiring pandas [1].

To continue talking to Dosu, mention @dosu.

drunkwcodes commented 2 months ago

Not working.

ImportError                               Traceback (most recent call last)
File ~/projects/bb7/.venv/lib/python3.12/site-packages/llama_index/core/readers/file/base.py:52, in _try_loading_included_file_formats()
     [51](https://file+.vscode-resource.vscode-cdn.net/home/drunkwcodes/projects/bb7/src/bb7/~/projects/bb7/.venv/lib/python3.12/site-packages/llama_index/core/readers/file/base.py:51) try:
---> [52](https://file+.vscode-resource.vscode-cdn.net/home/drunkwcodes/projects/bb7/src/bb7/~/projects/bb7/.venv/lib/python3.12/site-packages/llama_index/core/readers/file/base.py:52)     from llama_index.readers.file import (
     [53](https://file+.vscode-resource.vscode-cdn.net/home/drunkwcodes/projects/bb7/src/bb7/~/projects/bb7/.venv/lib/python3.12/site-packages/llama_index/core/readers/file/base.py:53)         DocxReader,
     [54](https://file+.vscode-resource.vscode-cdn.net/home/drunkwcodes/projects/bb7/src/bb7/~/projects/bb7/.venv/lib/python3.12/site-packages/llama_index/core/readers/file/base.py:54)         EpubReader,
     [55](https://file+.vscode-resource.vscode-cdn.net/home/drunkwcodes/projects/bb7/src/bb7/~/projects/bb7/.venv/lib/python3.12/site-packages/llama_index/core/readers/file/base.py:55)         HWPReader,
     [56](https://file+.vscode-resource.vscode-cdn.net/home/drunkwcodes/projects/bb7/src/bb7/~/projects/bb7/.venv/lib/python3.12/site-packages/llama_index/core/readers/file/base.py:56)         ImageReader,
     [57](https://file+.vscode-resource.vscode-cdn.net/home/drunkwcodes/projects/bb7/src/bb7/~/projects/bb7/.venv/lib/python3.12/site-packages/llama_index/core/readers/file/base.py:57)         IPYNBReader,
     [58](https://file+.vscode-resource.vscode-cdn.net/home/drunkwcodes/projects/bb7/src/bb7/~/projects/bb7/.venv/lib/python3.12/site-packages/llama_index/core/readers/file/base.py:58)         MarkdownReader,
     [59](https://file+.vscode-resource.vscode-cdn.net/home/drunkwcodes/projects/bb7/src/bb7/~/projects/bb7/.venv/lib/python3.12/site-packages/llama_index/core/readers/file/base.py:59)         MboxReader,
...
     [89](https://file+.vscode-resource.vscode-cdn.net/home/drunkwcodes/projects/bb7/src/bb7/~/projects/bb7/.venv/lib/python3.12/site-packages/llama_index/core/readers/file/base.py:89)     ".xlsx": PandasExcelReader,
     [90](https://file+.vscode-resource.vscode-cdn.net/home/drunkwcodes/projects/bb7/src/bb7/~/projects/bb7/.venv/lib/python3.12/site-packages/llama_index/core/readers/file/base.py:90) }
     [91](https://file+.vscode-resource.vscode-cdn.net/home/drunkwcodes/projects/bb7/src/bb7/~/projects/bb7/.venv/lib/python3.12/site-packages/llama_index/core/readers/file/base.py:91) return default_file_reader_cls

ImportError: `llama-index-readers-file` package not found
drunkwcodes commented 2 months ago

I re-installed the whole venv and resolve the issue.