Closed pitchdarkdata closed 3 months ago
Feel free to open a PR 👍🏻
Hi @logan-markewich, Thanks for the invitation, I have the fix and want to socialize it. Can you share the procedure to raise a PR.
Feel free to open a PR 👍🏻
[Bug]: BB loader in llama hub has a 3 critical issues that makes it unusable in its current form. #15158 Please review and provide feedback if any
Bug Description
BB loader in llamahub has critical bugs, I feel it cannot be used in the current form. Issue-1 : The content_url has an additional "/" in the path. Captured the error in the traceback section. Typo makes it unusable. Issue-2: Files without extensions are not handled. BB loaders exits if it parses a file such as Dockerfile that does not have any extension. Issue-3: Any file that has no content is exited too.
Version
llama-index==0.10.59
Steps to Reproduce
import os import logging import sys
from llama_index.llms.azure_openai import AzureOpenAI from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding from llama_index.core import VectorStoreIndex, SimpleDirectoryReader from llama_index.core import ServiceContext, set_global_service_context from llama_index.embeddings.huggingface import HuggingFaceEmbedding from llama_index.core import Settings from llama_index.core import VectorStoreIndex, download_loader from llama_index.readers.bitbucket import BitbucketReader
os.environ["OPENAI_API_KEY"] = "fill in here" os.environ["AZURE_OPENAI_ENDPOINT"] = "https://ai-foundation-api.app/ai-foundation/chat-ai/gpt4" api_key = "" azure_endpoint = "https://ai-foundation-api.app/ai-foundation/chat-ai/gpt4" api_version = "2023-05-15" os.environ["BITBUCKET_USERNAME"] = "fill in here" os.environ["BITBUCKET_API_KEY"] = "fill in here" base_url = "fillin here" project_key = "fill in here" repo = "fill in here"
llm = AzureOpenAI( model="gpt-4", deployment_name="my-custom-llm", api_key=api_key, azure_endpoint=azure_endpoint, api_version = "2023-05-15", ) service_context = ServiceContext.from_defaults(llm=llm, chunk_size=800, chunk_overlap=20) embed_model_bge = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5") text_embeddings = embed_model_bge.get_text_embedding("AI is awesome!")
Settings.llm = llm Settings.embed_model = embed_model_bge
loader = BitbucketReader( base_url=base_url, project_key=project_key, branch="refs/heads/master", repository=repo, extensions_to_skip=['json'], ) documents = loader.load_data() index = VectorStoreIndex.from_documents(documents) query_engine = index.as_query_engine()
Relevant Logs/Tracbacks