run-llama / llama-hub

A library of data loaders for LLMs made by the community -- to be used with LlamaIndex and/or LangChain
https://llamahub.ai/
MIT License
3.44k stars 731 forks source link

[Bug]: Unicode error in llamapacks dense_pack #843

Open deanbrr opened 8 months ago

deanbrr commented 8 months ago

Bug Description

invoked: from llama_index.llama_pack import download_llama_pack DenseXRetrievalPack = download_llama_pack("DenseXRetrievalPack", "./dense_pack")

produced error: Caught a SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xaf in position 662: invalid start byte (dense_pack/base.py, line 65)

Version

llama-index-0.9.26

Steps to Reproduce

from llama_hub.file.unstructured import UnstructuredReader

documents = UnstructuredReader().load_data("")

try: from llama_index.llama_pack import download_llama_pack DenseXRetrievalPack = download_llama_pack("DenseXRetrievalPack", "./dense_pack") except SyntaxError as e: print(f"Caught a SyntaxError: {e}") except Exception as e: print(f"An error occurred: {e}")

Relevant Logs/Tracbacks

No response

designcomputer commented 8 months ago

There is an improperly formatted character in "dense_pack/base.py" on line 65. It is easy to open the file in VScode and resave it with UTF-8 encoding, but it will be downloaded again when you rerun this:

from llama_index.llama_pack import download_llama_pack

DenseXRetrievalPack = download_llama_pack("DenseXRetrievalPack", "./dense_pack")

Instead, use this method to load the existing and newly fixed local copy directly.

from dense_pack.base import DenseXRetrievalPack