texstudio-org / texstudio

TeXstudio is a fully featured LaTeX editor. Our goal is to make writing LaTeX documents as easy and comfortable as possible.
http://www.texstudio.org/
GNU General Public License v3.0
2.7k stars 344 forks source link

Copy TeX LaTeX text without TeX markup. e.g:TeX \& freinds -> TeX & friends #32

Open antmw1361 opened 6 years ago

antmw1361 commented 6 years ago

This would be helpful when we copy text from @TeXstudio and paste into plain text boxes For example, while submitting a paper to a journal we often need to copy paste abstract to a plain text box in ManuscriptCentral, Evise websites. I often have to de-TeX the text and hide LaTeX markups. It would be great if TeXstudio can help.

dbitouze commented 6 years ago

You could just copy-paste from the PDF file.

sunderme commented 6 years ago

this is a task for scripting/macro. So if anybody wants to publish something here or better on the wiki, go ahead

dbitouze commented 6 years ago

Other solutions:

lesshaste commented 6 years ago

@dbitouze Copying and pasting from pdf is often a fatally bad idea. I find you often get symbols that look very much like what you expect but are subtly different which then appear completely incorrectly when, for example, printed. The most popular culprit is the dash “-“.

dbitouze commented 6 years ago

Copying and pasting from pdf is often a fatally bad idea.

I disagree: if the text you copied isn't meant to be reused in a .tex, it's not a problem and, even better, it is a feature: e.g., you asked an en-dash in your .tex source? You get an en-dash in the PDF output that you luckily can copy and paste.

thatlittleboy commented 6 years ago

@dbitouze I find the most problematic issue when copying from pdf is hyphenation of long words at the end of lines, which you obviously don't want to appear in the resulting paste.

@antmw1361 I find it hard for anyone to help (with scripting) if there are no examples of what kind of text to expect. Would there be inline math? Superscripts, subscripts etc.? References, cites, labels? Anyway, I concur that the easiest way is still to copy from pdf, then fixing hyphenation patterns etc.

dbitouze commented 6 years ago

@thatlittleboy No such an issue with pdftotext. For instance, the resulting PDF of the test.tex file:

\documentclass{article}
\usepackage{lipsum}
\begin{document}
\lipsum[1]
\end{document}

processed with:

pdftotext -nopgbrk test.pdf

generates a test.txt file containing:

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Ut purus elit, vestibulum ut, placerat ac, adipiscing vitae, felis. Curabitur dictum gravida mauris. Nam arcu libero, nonummy eget, consectetuer id, vulputate a, magna. Donec vehicula augue eu neque. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Mauris ut leo. Cras viverra metus rhoncus sem. Nulla et lectus vestibulum urna fringilla ultrices. Phasellus eu tellus sit amet tortor gravida placerat. Integer sapien est, iaculis in, pretium quis, viverra ac, nunc. Praesent eget sem vel leo ultrices bibendum. Aenean faucibus. Morbi dolor nulla, malesuada eu, pulvinar at, mollis ac, nulla. Curabitur auctor semper nulla. Donec varius orci eget risus. Duis nibh mi, congue eu, accumsan eleifend, sagittis quis, diam. Duis eget orci sit amet orci dignissim rutrum.

1