gyuhyon / open-tran

Automatically exported from code.google.com/p/open-tran
GNU General Public License v2.0
0 stars 0 forks source link

Ignore "null" translations #43

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Often the so-called translation is a verbatim copy of the (usually English) 
source. Such translations should not be listed. For example, this applies to 
almost all "Translation suggestions (he → en)" suggestions here: 
http://en.he.open-tran.eu/suggest/toggle

What steps will reproduce the problem?
1. Go to above URL, inspect suggestions.

What is the expected output? What do you see instead?

Remove all suggestions that are identical with or similar to the original text.

What version of the product are you using? On what operating system?

Web.

Please provide any additional information below.

Original issue reported on code.google.com by yaron.sh...@gmail.com on 13 Sep 2010 at 1:36

GoogleCodeExporter commented 9 years ago
This is a real issue, but it's not all that obvious to fix.

Original comment by sliw...@gmail.com on 13 Sep 2010 at 5:11

GoogleCodeExporter commented 9 years ago
A first approximation is byte-by-byte comparison of the source and target 
strings (I've seen many cases where translators decide that certain strings 
will only be seen by "advanced users" so should not be translated at all). A 
better approximation is: 
http://search.cpan.org/~mlehmann/String-Similarity-1.04/.

Original comment by yaron.sh...@gmail.com on 13 Sep 2010 at 8:19

GoogleCodeExporter commented 9 years ago
Sorry, that was Perl. For Python there are fewer options, but this fuzzy string 
matcher should do the job: http://code.google.com/p/python-ngram/

Original comment by yaron.sh...@gmail.com on 13 Sep 2010 at 10:39