remove new lines - Githubissues

hodanli commented 6 months ago

hi

thanks for the plugin. it is very helpful. can you implement a feature like in the link below. it is really annoying when pasting from pdfs and all new lines are preserved.

https://www.textfixer.com/tools/remove-line-breaks.php

thanks in advance.

usoonees commented 6 months ago

Hi, just wanna know the details, does it split the text into multiple lines within a single block, or does it divide it into several blocks, with each line in its own block?

hodanli commented 6 months ago

it keeps in one block with multiple lines. i expect it to be in one block without any new lines.

Text embeddings are useful features in many applications such as semantic search and com- puting text similarity. Previous work typically trains models customized for different use cases, varying in dataset choice, training objective and model architecture. In this work, we show that contrastive pre-training on unsupervised data at scale leads to high quality vector representations of text and code. The same unsupervised text em- beddings that achieve new state-of-the-art results in linear-probe classification also display impres- sive semantic search capabilities and sometimes even perform competitively with fine-tuned mod- els. On linear-probe classification accuracy aver- aging over 7 tasks, our best unsupervised model achieves a relative improvement of 4% and 1.8% over previous best unsupervised and supervised text embedding models respectively. The same text embeddings when evaluated on large-scale semantic search attains a relative improvement of 23.4%, 14.7%, and 10.6% over previous best unsupervised methods on MSMARCO, Natural Questions and TriviaQA benchmarks, respec- tively. Similarly to text embeddings, we train code embedding models on (text, code) pairs, ob- taining a 20.8% relative improvement over prior best work on code search

this is what i get.

Text embeddings are useful features in many applications such as semantic search and com- puting text similarity. Previous work typically trains models customized for different use cases, varying in dataset choice, training objective and model architecture. In this work, we show that contrastive pre-training on unsupervised data at scale leads to high quality vector representations of text and code. The same unsupervised text em- beddings that achieve new state-of-the-art results in linear-probe classification also display impres- sive semantic search capabilities and sometimes even perform competitively with fine-tuned mod- els. On linear-probe classification accuracy aver- aging over 7 tasks, our best unsupervised model achieves a relative improvement of 4% and 1.8% over previous best unsupervised and supervised text embedding models respectively. The same text embeddings when evaluated on large-scale semantic search attains a relative improvement of 23.4%, 14.7%, and 10.6% over previous best unsupervised methods on MSMARCO, Natural Questions and TriviaQA benchmarks, respec- tively. Similarly to text embeddings, we train code embedding models on (text, code) pairs, ob- taining a 20.8% relative improvement over prior best work on code search

this is what i expected.

usoonees commented 6 months ago

You can try to replace the break lines in vscode(or any other editor) first

hodanli commented 6 months ago

.

usoonees / logseq-plugin-paste-more

remove new lines #12