Closed 4entertainment closed 7 months ago
π€
Hello @4entertainment, great to see you again! I'm diving into your issue right now. I appreciate your patience and will get back to you with a comprehensive solution shortly.
π€
To use Turkish embeddings in your code, you need to replace the model in the SentenceTransformerEmbeddings
function with a model that supports Turkish language. However, I couldn't find a specific list of available models that support Turkish in the LangChain repository.
Here's how you can modify your code:
# create the open-source embedding function
embedding_function = SentenceTransformerEmbeddings(model_name="your-turkish-model")
Replace "your-turkish-model"
with the name of the model that supports Turkish.
Please note that the model you choose should be compatible with the SentenceTransformerEmbeddings function in LangChain. If you're unsure about which model to use, you might want to check the Sentence Transformers library's model list for a model that supports Turkish.
If you're still having trouble finding a suitable model or implementing this, could you please provide more information about the specific issues you're encountering? This will help me provide a more accurate solution.
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a π if this is helpful and π if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
@dosu-bot
i fix the code as following:
# import
from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.document_loaders import TextLoader
from silly import no_ssl_verification
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
with no_ssl_verification():
# load the document and split it into chunks
loader = TextLoader("paul_graham/paul_graham_essay.txt")
documents = loader.load()
# split it into chunks
text_splitter = CharacterTextSplitter(chunk_size=2000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
# create the open-source embedding function
embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
# hfemb = HuggingFaceEmbeddings()
# load it into Chroma
db = Chroma.from_documents(docs, embedding_function)
# query it
query = "What were the two main things the author worked on before college?"
docs = db.similarity_search(query)
# print results
print(docs[0].page_content)
i get following output:
"I was nervous about money, because I could sense that Interleaf was on the way down. Freelance Lisp hacking work was very rare, and I didn't want to have to program in another language, which in those days would have meant C++ if I was lucky. So with my unerring nose for financial opportunity, I decided to write another book on Lisp. This would be a popular book, the sort of book that could be used as a textbook. I imagined myself living frugally off the royalties and spending all my time painting. (The painting on the cover of this book, ANSI Common Lisp, is one that I painted around this time.)
The best thing about New York for me was the presence of Idelle and Julian Weber. Idelle Weber was a painter, one of the early photorealists, and I'd taken her painting class at Harvard. I've never known a teacher more beloved by her students. Large numbers of former students kept in touch with her, including me. After I moved to New York I became her de facto studio assistant.
She liked to paint on big, square canvases, 4 to 5 feet on a side. One day in late 1994 as I was stretching one of these monsters there was something on the radio about a famous fund manager. He wasn't that much older than me, and was super rich. The thought suddenly occurred to me: why don't I become rich? Then I'll be able to work on whatever I want.
Meanwhile I'd been hearing more and more about this new thing called the World Wide Web. Robert Morris showed it to me when I visited him in Cambridge, where he was now in grad school at Harvard. It seemed to me that the web would be a big deal. I'd seen what graphical user interfaces had done for the popularity of microcomputers. It seemed like the web would do the same for the internet."
bu i should get "Before college the two main things I worked on, outside of school, were writing and programming."
π€
Thank you for providing the code snippet. It's helpful in understanding the issue you're facing. However, to further diagnose the problem, I need a bit more information.
CharacterTextSplitter
?These details will help me understand the situation better and provide a more accurate solution.
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a π if this is helpful and π if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
Did you find a solution for this?
Hi, @4entertainment,
I'm helping the LangChain team manage their backlog and am marking this issue as stale. The issue was opened by you, and it seems like you were looking to incorporate Turkish embeddings into your existing code for a Q&A system with a retrieval mechanism. I see that there was some guidance provided by
Could you please let me know if this issue is still relevant to the latest version of the LangChain repository? If it is, please comment on the issue to let the LangChain team know. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you!
Issue you'd like to raise.
i have the following code for q&a system with retrieval mechanism:
i should do this retrieval in turkish dataset. so i should use turkish embeddings. how can i do that in my code?
Suggestion:
No response