run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.58k stars 5.23k forks source link

[Bug]: Can not install pandas properly. #16286

Closed drunkwcodes closed 1 month ago

drunkwcodes commented 1 month ago

Bug Description

I was trying to do a simple markdown RAG:

from llama_index.core import SimpleDirectoryReader

from llama_index.readers.file.markdown import MarkdownReader

parser = MarkdownReader()
file_extractor = {".md": parser}
documents = SimpleDirectoryReader(
    input_files= ["/home/drunkwcodes/projects/myfoam/kb/package_manager/pdm.md"], file_extractor=file_extractor
).load_data()
# documents = SimpleDirectoryReader(input_files= ["/home/drunkwcodes/projects/myfoam/kb/package_manager/pdm.md"]).load_data()

But the result is still in vain:

ModuleNotFoundError: No module named 'pandas.compat'

The c extensions is not built by installing from pypi.

Please purge the pandas dependency for the majority of people who want plain text reading.

Version

0.11.14

Steps to Reproduce

pdm add these:

"llama-index>=0.11.14", "llama-index-embeddings-huggingface>=0.3.1", "llama-index-readers-pdf-marker>=0.2.0", "llama-parse>=0.5.6", "nest-asyncio>=1.6.0", "marker-pdf>=0.2.17", "llama-index-readers-file>=0.2.2", "pandas[excel]>=2.2.3",

and run this script:

from llama_index.core import SimpleDirectoryReader

from llama_index.readers.file.markdown import MarkdownReader

parser = MarkdownReader()
file_extractor = {".md": parser}
documents = SimpleDirectoryReader(
    input_files= ["/home/drunkwcodes/projects/myfoam/kb/package_manager/pdm.md"], file_extractor=file_extractor
).load_data()
# documents = SimpleDirectoryReader(input_files= ["/home/drunkwcodes/projects/myfoam/kb/package_manager/pdm.md"]).load_data()

Relevant Logs/Tracbacks

No response

dosubot[bot] commented 1 month ago

Hey @drunkwcodes! I'm here to help you with any bugs, questions, or contributions you have. Let's tackle this issue together!

To resolve the "ModuleNotFoundError: No module named 'pandas.compat'" error when using SimpleDirectoryReader with MarkdownReader, you can configure SimpleDirectoryReader to exclude readers that depend on pandas. Here's how you can do it:

from llama_index.core.readers.file.base import SimpleDirectoryReader
from llama_index.readers.file.markdown import MarkdownReader

# Define a custom file_extractor that excludes Pandas readers
custom_file_extractor = {
    ".md": MarkdownReader,
    # Add other readers as needed, excluding those that require pandas
}

# Initialize SimpleDirectoryReader with the custom file_extractor
reader = SimpleDirectoryReader(
    input_files=["/home/drunkwcodes/projects/myfoam/kb/package_manager/pdm.md"],
    file_extractor=custom_file_extractor
)

# Use the reader to read files
documents = reader.load_data()

This configuration ensures that SimpleDirectoryReader uses MarkdownReader for .md files without requiring pandas [1].

To continue talking to Dosu, mention @dosu.

drunkwcodes commented 1 month ago

Not working.

ImportError                               Traceback (most recent call last)
File ~/projects/bb7/.venv/lib/python3.12/site-packages/llama_index/core/readers/file/base.py:52, in _try_loading_included_file_formats()
     [51](https://file+.vscode-resource.vscode-cdn.net/home/drunkwcodes/projects/bb7/src/bb7/~/projects/bb7/.venv/lib/python3.12/site-packages/llama_index/core/readers/file/base.py:51) try:
---> [52](https://file+.vscode-resource.vscode-cdn.net/home/drunkwcodes/projects/bb7/src/bb7/~/projects/bb7/.venv/lib/python3.12/site-packages/llama_index/core/readers/file/base.py:52)     from llama_index.readers.file import (
     [53](https://file+.vscode-resource.vscode-cdn.net/home/drunkwcodes/projects/bb7/src/bb7/~/projects/bb7/.venv/lib/python3.12/site-packages/llama_index/core/readers/file/base.py:53)         DocxReader,
     [54](https://file+.vscode-resource.vscode-cdn.net/home/drunkwcodes/projects/bb7/src/bb7/~/projects/bb7/.venv/lib/python3.12/site-packages/llama_index/core/readers/file/base.py:54)         EpubReader,
     [55](https://file+.vscode-resource.vscode-cdn.net/home/drunkwcodes/projects/bb7/src/bb7/~/projects/bb7/.venv/lib/python3.12/site-packages/llama_index/core/readers/file/base.py:55)         HWPReader,
     [56](https://file+.vscode-resource.vscode-cdn.net/home/drunkwcodes/projects/bb7/src/bb7/~/projects/bb7/.venv/lib/python3.12/site-packages/llama_index/core/readers/file/base.py:56)         ImageReader,
     [57](https://file+.vscode-resource.vscode-cdn.net/home/drunkwcodes/projects/bb7/src/bb7/~/projects/bb7/.venv/lib/python3.12/site-packages/llama_index/core/readers/file/base.py:57)         IPYNBReader,
     [58](https://file+.vscode-resource.vscode-cdn.net/home/drunkwcodes/projects/bb7/src/bb7/~/projects/bb7/.venv/lib/python3.12/site-packages/llama_index/core/readers/file/base.py:58)         MarkdownReader,
     [59](https://file+.vscode-resource.vscode-cdn.net/home/drunkwcodes/projects/bb7/src/bb7/~/projects/bb7/.venv/lib/python3.12/site-packages/llama_index/core/readers/file/base.py:59)         MboxReader,
...
     [89](https://file+.vscode-resource.vscode-cdn.net/home/drunkwcodes/projects/bb7/src/bb7/~/projects/bb7/.venv/lib/python3.12/site-packages/llama_index/core/readers/file/base.py:89)     ".xlsx": PandasExcelReader,
     [90](https://file+.vscode-resource.vscode-cdn.net/home/drunkwcodes/projects/bb7/src/bb7/~/projects/bb7/.venv/lib/python3.12/site-packages/llama_index/core/readers/file/base.py:90) }
     [91](https://file+.vscode-resource.vscode-cdn.net/home/drunkwcodes/projects/bb7/src/bb7/~/projects/bb7/.venv/lib/python3.12/site-packages/llama_index/core/readers/file/base.py:91) return default_file_reader_cls

ImportError: `llama-index-readers-file` package not found
drunkwcodes commented 1 month ago

I re-installed the whole venv and resolve the issue.