llama hub loaders - Githubissues

gfsysa commented 9 months ago

[LLAMA-INDEX] Indexing data...
[LLAMA-INDEX] Idx: base, type: db_current, content: 1706767238
langchain\tools\__init__.py:63: LangChainDeprecationWarning: Importing tools from langchain is deprecated. Importing from langchain will no longer be supported as of langchain==0.2.0. Please import from langchain-community instead:

`from langchain_community.tools import __wrapped__`.

To install langchain-community run `pip install -U langchain-community`.

Just a warning, not clear on whether it's an issue. But having issues adding llamachain loaders, agents, packages -- still testing, but unclear on mgmt.

gfsysa commented 9 months ago

Worked past that one, I think.

Hit this though indexing a few hundred txt files, it's been erroring for a good 15 min -- not sure how to interrupt, batch, or restart from the last point -- just killed it:

`[LLAMA-INDEX] Reading documents from path: [omit] [LLAMA-INDEX] Using online loader for: txt pygame 2.5.2 (SDL 2.28.3, Python 3.10.11) Hello from the pygame community. https://www.pygame.org/contribute.html 2024-02-02 04:52:26.4656787 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:1983 onnxruntime::python::CreateInferencePybindStateModule] Init provider bridge failed. usage: pygpt.exe [-h] [-d DEBUG] pygpt.exe: error: unrecognized arguments: -m pip install -r [omit]/requirements.txt Command '[omit]', '-m', 'pip', 'install', '-r', '[omit]/requirements.txt']' returned non-zero exit status 2. Error while indexing file: [omit] Type: CalledProcessError, Message: Command '[omit]', '-m', 'pip', 'install', '-r', '[omit]/requirements.txt']' returned non-zero exit status 2. Traceback: File "llama_index\readers\download.py", line 49, in download_loader File "llama_index\download\module.py", line 229, in download_llama_module File "llama_index\download\module.py", line 173, in download_module_and_reqs File "subprocess.py", line 369, in check_call

[LLAMA-INDEX] Reading documents from path: [omit] [LLAMA-INDEX] Using online loader for: txt pygame 2.5.2 (SDL 2.28.3, Python 3.10.11) Hello from the pygame community. https://www.pygame.org/contribute.html 2024-02-02 04:52:30.2077052 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:1983 onnxruntime::python::CreateInferencePybindStateModule] Init provider bridge failed. usage: pygpt.exe [-h] [-d DEBUG] pygpt.exe: error: unrecognized arguments: -m pip install -r [omit]/requirements.txt Command '[omit]', '-m', 'pip', 'install', '-r', '[omit]/requirements.txt']' returned non-zero exit status 2. Error while indexing file: [omit] Type: CalledProcessError, Message: Command '[omit]', '-m', 'pip', 'install', '-r', '[omit]/requirements.txt']' returned non-zero exit status 2. Traceback: File "llama_index\readers\download.py", line 49, in download_loader File "llama_index\download\module.py", line 229, in download_llama_module File "llama_index\download\module.py", line 173, in download_module_and_reqs File "subprocess.py", line 369, in check_call `

Error Executing pygpt.exe with Pip Install Command: The core issue seems to be with executing a command intended to install Python packages from a requirements.txt file. The command is failing because pygpt.exe is being invoked with arguments that it does not recognize or support (-m pip install -r ...). This results in a non-zero exit status, indicating failure.

Error Handling and Reporting: The application catches the error (a CalledProcessError) and logs a detailed message, including the problematic command and the file it was attempting to process when the error occurred. The traceback shows that the error originates from attempting to download and install modules or requirements as part of the document indexing process.

szczyglis-dev commented 9 months ago

Are you using the compiled version?

In the compiled version (run from .exe), online-loaders downloaded by llama-index on the fly won't work properly - they only work with the version run directly using Python.

Btw, you don't need to use an online-loader for .txt files, there's a built-in offline version for .txt files.

szczyglis-dev commented 9 months ago

In release 2.0.146, usage of online-loaders from Llama-index has been disabled if a compiled version of the app is detected (internal architecture of these loaders requires that everything be run in a Python environment).

gfsysa commented 9 months ago

Thanks, that's helpful! I referred to the documentation

--

Built-in file loaders (offline): text files, pdf, csv, md, docx, json, epub, xlsx. You can extend this list in Settings / Llama-index by providing list of online loaders (from LlamaHub) - but only for Python version, will not work in compiled version. All loaders included for offline use are also from LlamaHub, but they are attached locally with all necessary library dependencies included. You can also develop and provide your own custom offline loader and register it within the application.

Adding custom vector stores and offline data loaders You can create a custom vector store provider or data loader for your data and develop a custom launcher for the application. To register your custom vector store provider or data loader, simply register it by passing the vector store provider instance to vector_stores keyword argument and loader instance in the loaders keyword argument:

--

I understood this to mean, I run this in the CMD of Windows, not sure if necessary, but I ran this in /llamahub_loaders

pip install -r requirements.txt`

I then added loaders in Config > Settings > Indexes --> "Additional online data loaders..." --> ADD --> LOADER

These loaders: XMLReader, BeautifulSoupWebReader, WikipediaReader

Also, duplicated some capabilities in hopes of evaluation, added: SimpleWebPageReader, RemoteReader (but not how to handle defaults, preferences, conflicts... I suppose by adding them one-at-a-time.)

A pop-up appears with the following:

pip install torch transformers python-pptx Pillow`

Missing optional dependency 'openpyxl'. Use pip or conda to install openpyxl.

Getting this;

Please install extra dependencies that are required for the PptxReader:

pip install torch transformers python-pptx Pillow Missing optional dependency 'openpyxl'.

Tried this: pip install torch transformers python-pptx Pillow openpyxl

This is the console:

pip install torch transformers python-pptx Pillow openpyxl

gfsysa commented 9 months ago

did this: pip install --upgrade --force-reinstall openpyxl

Installing collected packages: et-xmlfile, openpyxl Attempting uninstall: et-xmlfile Found existing installation: et-xmlfile 1.1.0 Uninstalling et-xmlfile-1.1.0: Successfully uninstalled et-xmlfile-1.1.0 Attempting uninstall: openpyxl Found existing installation: openpyxl 3.1.2 Uninstalling openpyxl-3.1.2: Successfully uninstalled openpyxl-3.1.2 Successfully installed et-xmlfile-1.1.0 openpyxl-3.1.2

gfsysa commented 9 months ago

I abandoned the Windows downloader, running in a new local python env -- same issue when encountering xlsx files

gfsysa commented 9 months ago

I may be through the local loader setup with a vector database, still kicking the tires, but for the sake of documenting for other oobas:

In CMD activate the your python env...

pip install redis[hiredis] pip install torch transformers python-pptx Pillow # This addresses the xlsx dependency issue, loads in the UI.

create a custom launcher . py, add to the Scripts directory of the install, not the config (that's where the data resides by default)... this includes a wait for redis to initialize, to resolve an issue with connection actively refused:

import subprocess
import socket
import time

def check_redis_running(host='localhost', port=6379):
    """Check if Redis is running by attempting to connect to it."""
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
        return s.connect_ex((host, port)) == 0

def start_redis():
    """Start the Redis server."""
    # Adjust the command according to your Redis installation path and preferences
    redis_server_command = 'redis-server'
    subprocess.Popen(redis_server_command, shell=True)
    print("Starting Redis server...")

# Check if Redis is running
if not check_redis_running():
    start_redis()
    time.sleep(2)  # Wait a bit for Redis to initialize
else:
    print("Redis is already running.")

# Now, proceed with the rest of your PyGPT application setup
from pygpt_net.app import run
from pygpt_net.provider.vector_stores.redis import RedisVectorStore
from llama_hub import LLamaHubLoader

vector_stores = [
    RedisVectorStore(),  # Use the RedisVectorStore
]

loaders = [
    LLamaHubLoader(),
]

run(
    vector_stores=vector_stores,  # List with Redis vector store provider
    loaders=loaders  # <--- list with custom data loaders

)

I'm sure that could be improved -- gpt assisted for sure

szczyglis-dev / py-gpt

llama hub loaders #17