marella / ctransformers

Python bindings for the Transformer models implemented in C/C++ using GGML library.
MIT License
1.81k stars 138 forks source link

What embeddings i need to use for llama-2-70b/ggml-model.bin.How to load using Ctransformers. #95

Closed Revanth-guduru-balaji closed 1 year ago

Revanth-guduru-balaji commented 1 year ago

Kindly help me in what embeddings to use llama-2-70b/ggml-model.bin. How to load using Ctransformers. embeddings =? llm = CTransformers(model="./models/llama-2-7b.ggmlv3.q4_0.bin",model_type="llama") doc_search = FAISS.from_documents(texts,embeddings) retriever = doc_search.as_retriever(search_type="similarity",search_kwargs={"k":len(texts),"max_source":len(texts)}) chain = RetrievalQA.from_chain_type(llm=llm,chain_type='stuff', retriever = retriever , return_source_documents=True)

aliakyurek commented 1 year ago

As far as I know, you can use any embedding for this purpose. Did you try HuggingFaceEmbeddings from langchain?

Revanth-guduru-balaji commented 1 year ago

As far as I know, you can use any embedding for this purpose. Did you try HuggingFaceEmbeddings from langchain?

I have used embeddings=HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2') but I am not sure whether it performs well with llama-2-70B.

aliakyurek commented 1 year ago

For this workflow, embedding engine used to retrieve similar chunks from vectorstore and then this chunks are sent to LLM as sentences. So, if you think that performance is poor, maybe you can try another embedding model.(https://huggingface.co/spaces/mteb/leaderboard) There's also a HuggingFaceInstructEmbedding class that can be tried out.

marella commented 1 year ago

As mentioned by aliakyurek, embedding models are independent of LLMs. all-MiniLM-L6-v2 is small and fast but not good. I have created a tool called chatdocs which uses hkunlp/instructor-large embeddings, so try using the HuggingFaceInstructEmbeddings class.