I tested batch translation for a column of product names from Chinese (Simplified Chinese Mandarin) to English, code below ran smoothly with no output error. After manually checking samples of the resulting translations exported to .csv and .xlxs, I found that many were translated incorrectly.
Batch size: over 5000 rows under df['Product_Name'] column, see below.
No N/A or missing values in the original column.
No N/A or missing values in the resulting df['Product_Name_Eng'] column.
Example of a wrong translation:
"life space益生菌大人调理肠胃肠道双歧杆菌元免疫力提旗舰店正品" has been translated as "[Second sale] wlab makeup primer, primer, invisible pores, flagship store, genuine product, valid until 24/06" on the corresponding row.
Not all are wrong, I'd say most are correct (stopped manually checking after a while)
Example of a successful translation:
"康萃乐儿童益生菌宝宝婴幼儿调理肠胃鼠李糖乳杆菌lgg冲剂30袋" has been translated as "Kangcuile children's probiotics baby infant gastrointestinal conditioning Lactobacillus rhamnosus LGG granules 30 bags" on the corresponding row.
What I did
I ran the following in Jupyter Lab 3.6.3
import pandas as pd
from deep_translator import GoogleTranslator
from concurrent.futures import ThreadPoolExecutor
df = pd.read_csv('test_translation.csv')
translator = GoogleTranslator(source='chinese (simplified)', target='english')
def translate_text(text):
try:
return translator.translate(text)
except Exception as e:
return text # Return the original text if translation fails
def batch_translate_texts(texts):
with ThreadPoolExecutor(max_workers=10) as executor:
translated_texts = list(executor.map(translate_text, texts))
return translated_texts
product_names = df['Product_Name'].astype(str).tolist()
translated_names = batch_translate_texts(product_names)
df['Product_Name_Eng'] = translated_names
I then re-run the same code on a smaller batch of 600+ rows, the same string that was originally translated incorrectly in my example has been translated correctly the second time.
Same string this time translated correctly:
"life space益生菌大人调理肠胃肠道双歧杆菌元免疫力提旗舰店正品" has been translated as "Life space probiotics for adults regulate gastrointestinal tract Bifidobacterium Yuan immunity enhance flagship store authentic" which is an acceptable automatic translation for my project.
Need help to understand how to ensure reliable translation results.
Thank you
Descripion
I tested batch translation for a column of product names from Chinese (Simplified Chinese Mandarin) to English, code below ran smoothly with no output error. After manually checking samples of the resulting translations exported to .csv and .xlxs, I found that many were translated incorrectly.
Batch size: over 5000 rows under df['Product_Name'] column, see below. No N/A or missing values in the original column. No N/A or missing values in the resulting df['Product_Name_Eng'] column.
Example of a wrong translation: "life space益生菌大人调理肠胃肠道双歧杆菌元免疫力提旗舰店正品" has been translated as "[Second sale] wlab makeup primer, primer, invisible pores, flagship store, genuine product, valid until 24/06" on the corresponding row.
Not all are wrong, I'd say most are correct (stopped manually checking after a while) Example of a successful translation: "康萃乐儿童益生菌宝宝婴幼儿调理肠胃鼠李糖乳杆菌lgg冲剂30袋" has been translated as "Kangcuile children's probiotics baby infant gastrointestinal conditioning Lactobacillus rhamnosus LGG granules 30 bags" on the corresponding row.
What I did
I ran the following in Jupyter Lab 3.6.3
I then re-run the same code on a smaller batch of 600+ rows, the same
string
that was originally translated incorrectly in my example has been translated correctly the second time.Same
string
this time translated correctly: "life space益生菌大人调理肠胃肠道双歧杆菌元免疫力提旗舰店正品" has been translated as "Life space probiotics for adults regulate gastrointestinal tract Bifidobacterium Yuan immunity enhance flagship store authentic" which is an acceptable automatic translation for my project.Need help to understand how to ensure reliable translation results. Thank you