lushan88a / google_trans_new

A free and unlimited python API for google translate.
MIT License
393 stars 170 forks source link

No real way to translate a list #19

Open NinthAutumn opened 3 years ago

NinthAutumn commented 3 years ago

While I tried several method of separating lines, no matter what I do, whether it be \n, , \n\n\n\n, many sentences gets merged and thus unable to truly distinguish the translation

bjquinniii commented 3 years ago

Did you find a solution yet? I ask because I may have an approach that will work. In the text that I am pushing through this service there are paragraph markers that are being preserved cleanly in the output. In unicode the paragraph separator is U+2029 (LF is U+000A, CR is U+000D). I'm going to have to dig deeper to see how I'm getting the embedded paragraph separators, but they are working for me. Just a bit of background on how the paragraph markers are ending up in my text blocks... In my project, I'm grabbing raw html from a site and saving those blocks of text in a MariaDB table (in a longtext field). Later, I pull all of those blocks out, push them through Beautiful Soup to find the correct

block and grab the text. I push that text through the Google translate call and save both the text and the translated text back into the db. When I look at table (using DBeaver) I can see the little paragraph markers (kind of a reverse capital P) in both of these fields and the formatting comes through.

Another approach that might work would be some string manipulation. Take you original list and build a string with something that won't translate in all of the spots where you need line breaks (maybe use something like "") and then pass that string through the service. It should leave the "" things alone (and leave them embedded in the translated return). Then take the translated string and replace all of the "" embeds with the proper formatting thing you need ("\n" or some such).

bjquinniii commented 3 years ago

Here's a simple example of what I meant about embedding your own separators: `# encoding: utf-8 from google_trans_new import google_translator

translator = google_translator() fruits = list() fruits.append('apple') fruits.append('banana') fruits.append('cherry') print(fruits) fruitString = '' for f in fruits: fruitString = fruitString + f + '[cr]' print(fruitString) fruitStringTrans = translator.translat(fruitString, lang_src='en', lang_tgt='pt') print(fruitStringTrans) fruitStringTransList = fruitStringTrans.split(' [cr] ') print(fruitStringTransList) for f in fruitStringTransList: print(f) `

note: not sure why I can't figure out how to make it look like a real code block. I used the insert code button (<>) and included the spaces for the loops, but everything jammed over to the left. In any case, this example gets a list out of the google translate. The one odd thing is that the google translate does seem to be wrapping spaces around things it doesn't recognize. So the output from the above code looks like this:

['apple', 'banana', 'cherry'] apple[cr]banana[cr]cherry[cr] maçã [cr] banana [cr] cereja [cr] ['maçã', 'banana', 'cereja', ''] maçã banana cereja

bjquinniii commented 3 years ago

And now I'm feeling pretty silly. Sometimes the simple thing actually works... just pass it in as a list:

`# encoding: utf-8 from google_trans_new import google_translator

translator = google_translator() fruits = list() fruits.append('apple') fruits.append('banana') fruits.append('cherry') print(fruits) fruitsTrans = translator.translat(fruits, lang_src='en', lang_tgt='pt') print(fruitsTrans)`

printout looks like this:

['apple', 'banana', 'cherry'] ['maçã', 'banana', 'cereja']

melanatech commented 3 years ago

printout looks like this:

['apple', 'banana', 'cherry'] ['maçã', 'banana', 'cereja']

For anyone referencing this, the printout is a string of a list. If you have a very short list and just want to look at the translations, this will work. Otherwise, you have to parse the string to separate the list items. A for loop is a much better solution.

bjquinniii commented 3 years ago

Yes, you're right, I didn't notice that while it accepted a list, it returned a string of a list... There are a variety of ways to convert the string representation back into a list, but you're right, looping the translate calls is a better solution, except when it doesn't work because of the built in restrictions. There are two limits that can come into play. The first is the number of translation requests you can make from a single IP. Not sure what the actual limit is, but experience has shown me that I can make about a thousand requests an hour. If I go much over that, then my IP gets blocked (something like 8 - 10 hours). The second is that individual requests have to be less than 5000 characters or they just don't get done. So depending on the length and size of the entries in your list, you would need to play with things to stay within these two limits.