Closed hodanli closed 6 months ago
Hi, just wanna know the details, does it split the text into multiple lines within a single block, or does it divide it into several blocks, with each line in its own block?
it keeps in one block with multiple lines. i expect it to be in one block without any new lines.
Text embeddings are useful features in many applications such as semantic search and com- puting text similarity. Previous work typically trains models customized for different use cases, varying in dataset choice, training objective and model architecture. In this work, we show that contrastive pre-training on unsupervised data at scale leads to high quality vector representations of text and code. The same unsupervised text em- beddings that achieve new state-of-the-art results in linear-probe classification also display impres- sive semantic search capabilities and sometimes even perform competitively with fine-tuned mod- els. On linear-probe classification accuracy aver- aging over 7 tasks, our best unsupervised model achieves a relative improvement of 4% and 1.8% over previous best unsupervised and supervised text embedding models respectively. The same text embeddings when evaluated on large-scale semantic search attains a relative improvement of 23.4%, 14.7%, and 10.6% over previous best unsupervised methods on MSMARCO, Natural Questions and TriviaQA benchmarks, respec- tively. Similarly to text embeddings, we train code embedding models on (text, code) pairs, ob- taining a 20.8% relative improvement over prior best work on code search
this is what i get.
Text embeddings are useful features in many applications such as semantic search and com- puting text similarity. Previous work typically trains models customized for different use cases, varying in dataset choice, training objective and model architecture. In this work, we show that contrastive pre-training on unsupervised data at scale leads to high quality vector representations of text and code. The same unsupervised text em- beddings that achieve new state-of-the-art results in linear-probe classification also display impres- sive semantic search capabilities and sometimes even perform competitively with fine-tuned mod- els. On linear-probe classification accuracy aver- aging over 7 tasks, our best unsupervised model achieves a relative improvement of 4% and 1.8% over previous best unsupervised and supervised text embedding models respectively. The same text embeddings when evaluated on large-scale semantic search attains a relative improvement of 23.4%, 14.7%, and 10.6% over previous best unsupervised methods on MSMARCO, Natural Questions and TriviaQA benchmarks, respec- tively. Similarly to text embeddings, we train code embedding models on (text, code) pairs, ob- taining a 20.8% relative improvement over prior best work on code search
this is what i expected.
You can try to replace the break lines in vscode(or any other editor) first
.
hi
thanks for the plugin. it is very helpful. can you implement a feature like in the link below. it is really annoying when pasting from pdfs and all new lines are preserved.
https://www.textfixer.com/tools/remove-line-breaks.php
thanks in advance.