serebrov / get-med

Scrapping medical articles (html/pdf) with mechanize, save and translate with google translate
0 stars 0 forks source link

How to use #1

Closed zaixi closed 6 years ago

zaixi commented 6 years ago

I also recently need to translate pdf, found the repository, can you tell me how to use it

serebrov commented 6 years ago

These scripts were written around 5 years ago and google translate UI has changed, so they are not fully functional. I did some review and cleanup, the translation by url still works, so as the script. What it does is: download the HTML, save it, pass the html file URL to google translate (for example,, download and save the translated page. The "core" translation feature is this function:

For PDFs I've did this: download PDF, convert it to html, translate html via google translate (the - this doesn't work now because I was using the form on the google translate page and it now works differently than before. But now the translation by URL also works for PDFs (for example, so you can quite easily adopt the approach used for htmls (or even use the script directly).

Note: I am not 100% sure, but I think the automated usage of google translate may violate Google TOS. It might be OK to translate few files for your personal use in a way I did here, but you shouldn't use this approach in the commercial software, instead use the translation API:

zaixi commented 6 years ago

Thanks, this can already help me, I just translate a few PDFs personally.