windingwind / zotero-pdf-translate

Translate PDF, EPub, webpage, metadata, annotations, notes to the target language. Support 20+ translate services.
GNU Affero General Public License v3.0
7k stars 336 forks source link

[Feature]Additional features for full-text translation #344

Closed felntc closed 1 year ago

felntc commented 1 year ago

Is your feature request related to a problem? Please describe. Thank you for developing this useful plugin. I tend to translate entire papers rather than translate selected parts. Could you please add a full-text translation feature?

Describe the solution you'd like (1) For pdfs that have already been ocr'd, I would like to extract the full text and get the result of translating it. (2) Also, I would like to have the option to add to the notes, not annotations.

Describe alternatives you've considered Here is my usual workflow. (1) First, add the pdf to zotero. Then, open the paper in zotero and create a notebook. Then, copy the article one page at a time from abstract to the end and paste it into the notebook. Use shift+command+V to paste in plain text (the citation information is not necessary in the translation process). (The official "Add to note popup" is not used because it does not allow you to paste plain text.) Also, I do not include images because I can refer to pdfs. (4) Input the pasted plain-text note into deepL (not API), and (5) overwrite the output into the note. This is quite beneficial for me personally in reading papers, but it is time consuming.

Additional context I recently found out that zotero's pdf backend is xpdf and found that xpdf has a pdftotext option, so I experimented with it. Text extraction works well enough, but assuming it is a thesis, I need to get rid of repetitive page breaks and repetitive thesis meta information, etc. However, when I am reading a paper in zotero, it is a hassle to start up the terminal and convert the pdf file every time, so it would be quite appreciated if just the function to extract the full text could be implemented.

windingwind commented 1 year ago

Full-text translation consumes a large amount of translation quota and is dangerous for normal users. Even without xpdf, you can directly query the full text of the PDF in the reader tab. However, I think a full-text copy is out of the scope of the topic of translation. And sometimes the full text can be very large with books.