sloria / TextBlob

Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.
https://textblob.readthedocs.io/
MIT License
9.08k stars 1.13k forks source link

failed to translate the sentences with mixed languages (English and Hindi) #337

Open MrRaghav opened 4 years ago

MrRaghav commented 4 years ago

Hello, I'm not sure if this is a bug or I've missed something but I request your few minutes regarding this.

As far as I referred the documentation, Textblob uses Google API for the translations. I had a document in English and I translated it to Hindi language.

But, Textblob didn't translate the sentences in which there was a mixture of English and Hindi words. I have a sample code with the output:

from textblob import TextBlob
import pandas as pd
from time import sleep
from textblob.exceptions import NotTranslated

#list1 is the list of sentences
#first sentence starts with Hindi and ends with English, third sentence starts with English and ends with Hindi

list1 = ["कोरोना वायरस अपडेट corona virus update", "corona virus update", "corona virus update कोरोना वायरस अपडेट" ]
hindiTranslate = []

for index in list1:
    blob = TextBlob(index)
    print(blob)
    try:
        hindiTranslate.append(blob.translate(to='hi'))
        sleep(2)
    except NotTranslated:
        hindiTranslate.append("Not translated")

print(hindiTranslate)

Output:

['Not translated', TextBlob("कोरोनावाइरस अपडेट"), 'Not translated']

As we can see, that the sentences with mixed languages were not translated. But, can Google Translate do it?

Yes, it can. See the following image and steps performed below:

possible bug in textblob for mixed language

  1. Open Google Translate on Chrome/Firefox
  2. Select the input language as English
  3. Select the output language as Hindi
  4. Input following line: कोरोना वायरस अपडेट coronavirus update
  5. The output will be: कोरोनावायरस अपडेट कोरोनावायरस अपडेट

That means, Google translate successfully translates sentences in mixed language. Textblob is using Google API, so it should also be able to do so.

Please let me know if I missed anything.