run-llama / llama-hub

A library of data loaders for LLMs made by the community -- to be used with LlamaIndex and/or LangChain
https://llamahub.ai/
MIT License
3.42k stars 727 forks source link

[Bug]: incompatible imports between latest llama_hub and latest llama_index downloaded via pip + pypi #955

Open aaronjolson opened 4 months ago

aaronjolson commented 4 months ago

Bug Description

I am trying to work through a script for indexing a github repo. I determined that the script was written for an earlier version of llama_index (pre 0.10) I thought I would try and bring it up to date with the latest version of llama_index and llama_hub. I installed both llama-index and llama-hub via pip and pypi

This is the pip freeze of my llama deps

llama-hub==0.0.79.post1
llama-index==0.10.6
llama-index-agent-openai==0.1.1
llama-index-core==0.10.6.post1
llama-index-embeddings-openai==0.1.1
llama-index-legacy==0.9.48
llama-index-llms-openai==0.1.2
llama-index-multi-modal-llms-openai==0.1.1
llama-index-program-openai==0.1.2
llama-index-question-gen-openai==0.1.1
llama-index-readers-file==0.1.3
llamaindex-py-client==0.1.12

These are the imports at the top of my script ( and the place where my script is erroring)

import os
import textwrap
from dotenv import load_dotenv
from llama_index.legacy import download_loader
from llama_hub.github_repo import GithubRepositoryReader, GithubClient
from llama_index.core import VectorStoreIndex
from llama_index.legacy.vector_stores import DeepLakeVectorStore
from llama_index.core.storage.storage_context import StorageContext
import re

This is the error I am getting

Traceback (most recent call last):
  File "C:\Users\aaols\PycharmProjects\experiments\llamaindex_activeloop_vectorize_data_from_github.py", line 12, in <module>
    from llama_hub.github_repo import GithubRepositoryReader, GithubClient
  File "C:\Users\aaols\PycharmProjects\experiments\venv\lib\site-packages\llama_hub\github_repo\__init__.py", line 2, in <module>
    from llama_hub.github_repo.base import (
  File "C:\Users\aaols\PycharmProjects\experiments\venv\lib\site-packages\llama_hub\github_repo\base.py", line 18, in <module>
    from llama_index.readers.base import BaseReader
ModuleNotFoundError: No module named 'llama_index.readers.base'

which looks like the latest version of llama_hub in pypi is not yet aware of the changes in llama_index. This is a case where llama_hub relies on a specific version (or range of versions) of llama_index and this should really be called out in the deps. https://github.com/run-llama/llama-hub/blob/main/pyproject.toml#L19 Should likely be changed to llama-index = ">=0.9.41, <0.10.0" as llama-index is a dependency of llama-hub, and if this is a known place of incompatibility, this should be called out in the pyproject.toml

Version

0.10.6

Steps to Reproduce

install the same versions of llama-index and llama-hub as noted above

Relevant Logs/Tracbacks

No response

styk-tv commented 4 months ago

point of no return confirmed in new 0.10.9 hub will no longer work after this bump https://github.com/run-llama/llama_index/issues/10914

Please update hub documentation after 0.10 update with reference to llama_index (probably best than leaving it broken) https://llamahub.ai/l/readers/llama-index-readers-youtube-transcript?from=

#from llama_index.youtube_transcript import YoutubeTranscriptReader
from llama_index.readers.youtube_transcript import YoutubeTranscriptReader

loader = YoutubeTranscriptReader()
documents = loader.load_data(
    ytlinks=["https://www.youtube.com/watch?v=i3OYlaoj-BM"]
)

@aaronjolson try dropping completely hub references in favour of index as per above example. i'm assuming github readers are in similar situation

JimmyBcn commented 4 months ago

Confluence reader is in the same situation. When installed it references to llama_index.readers.base despite it's OK at the source project

I'm not that experienced in Python packages, is there a way to fix this?

knyga commented 4 months ago

https://github.com/run-llama/llama-hub

With the launch of LlamaIndex v0.10, we are deprecating this llama_hub repo - all integrations (data loaders, tools) and packs are now in the core llama-index Python repository. LlamaHub will continue to exist. We are revamping llamahub.ai point to all integrations/packs/datasets available in the llama-index repo.