run-llama / llama-hub

A library of data loaders for LLMs made by the community -- to be used with LlamaIndex and/or LangChain
https://llamahub.ai/
MIT License
3.45k stars 732 forks source link

[Question]: #762

Open stephanedebove opened 10 months ago

stephanedebove commented 10 months ago

Question Validation

Question

Does fuzzy_citation need a specific tokenizer, text splitter or llm in order to work properly? I’m using

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en", max_length=512)

service_context = ServiceContext.from_defaults(
    llm=llm, 
    embed_model=embed_model, 
    chunk_size=512, 
    chunk_overlap=20
)

and zephyr-7b-beta for llm, but the extracted parts of the source node used for the response are always a bit off (and for some prompts, I get an IndexError: list index out of range error)

anoopshrma commented 10 months ago

I think @logan-markewich can answer this better 🦇🔦