Open utterances-bot opened 7 months ago
Hello, author, many thanks for your explaination. You said "We’ll start using all-MiniLM-L6-v2. It’s not the best open-source embedding model", I want to know which model is the best and how to find the best model list? I am fresh, sorry for bringing this question, thank you very much!
Hello, author, many thanks for your explaination. You said "We’ll start using all-MiniLM-L6-v2. It’s not the best open-source embedding model", I want to know which model is the best and how to find the best model list? I am fresh, sorry for bringing this question, thank you very much!
Hi @songxujay, author is covering this in the "Selecting and evaluating models" part. Have a look at it. One of the main source is still the MTEB Leaderboard - https://huggingface.co/spaces/mteb/leaderboard
Hi! Thank you for the great article. To better understand the differences between word2vec- and Transformer-based embeddings, could you elaborate how the masked language modelling objective of BERT is different from the CBOW objective in word2vec (which as I understand is also about "filling in a blank"). Is it that the objectives are similar but the neural net architectures differ in these two approaches, allowing BERT to add contextual info?
Hey @arnoldlayne0! Overall you're right, BERT and CBOW objectives have some similarities. Here are some differences
I think something has changed about the quora dataset used in the colab example. I'm getting this error:
from datasets import load_dataset
dataset = load_dataset("quora")["train"]
TypeError: http_get() got an unexpected keyword argument 'displayed_filename'
Just what I needed entering the world of LLMs, thank you a lot!
hackerllama - Sentence Embeddings
Everything you wanted to know about sentence embeddings (and maybe a bit more)
https://osanseviero.github.io/hackerllama/blog/posts/sentence_embeddings/