Better embedding extraction

As pointed out in https://github.com/rustformers/llm/pull/291, the quality of embeddings produced by the models at present appears to be suboptimal.

Our current approach uses the embedding of the final token as a representation for the entire input sequence, which might lead to the omission of some semantic information. The approach employed by SGPT: GPT Sentence Embeddings for Semantic Search offers an alternative: they use a weighted mean sampling method to amalgamate the embeddings of all tokens in the input sequence. According to the MTEB-Benchmark, this method results in superior embeddings.

So, this poses the question: should we integrate this method into our implementation? Alternatively, should we leave it to users to manually extract the embeddings for each token and carry out the calculations themselves?

rustformers / llm

Better embedding extraction #295