redserpent7 / cld2

Automatically exported from code.google.com/p/cld2
0 stars 0 forks source link

Undefined language on a page that looks normal #8

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Apparently, CLD2 has some difficulties(*) with 
http://drugoi.livejournal.com/3971967.html 

We are seeing UND (undefined) on chrome://translate-internals

*: or maybe we are mis-using it...

Original issue reported on code.google.com by kenjibaheux@chromium.org on 5 Mar 2014 at 6:59

GoogleCodeExporter commented 9 years ago
Cannot reproduce.
I opened http://drugoi.livejournal.com/3971967.html in Firefox and did 
copy/paste of all the text into a UTF8 file, then ran
 ./compact_lang_det_test_chrome0122_2 should_not_be_unk_chrome_8.utf8
and got 
  ExtLanguage RUSSIAN(80% 1027p), UKRAINIAN(2% 450p), INDONESIAN(0% 637p), 40/45 KB of non-tag letters, Summary: RUSSIAN
  SummaryLanguage RUSSIAN at 0 of 46701 2617us (17 MB/sec), should_not_be_unk_chrome_8.utf8

If you are not getting that result, please rerun in your context, setting 
kCLDFlagEcho as the flag value in the call to ExtDetectLanguageSummary and send 
me stderr (not post or email, which open the possibility of various 
svn/web/mail/browser software changing the exact bytes), or run with flags  
  kCLDFlagHtml | kCLDFlagCr  
and send me stderr, or compare to the attached file of the output that I got.

Is it possible that there is an encoding problem and you are not passing clean 
UTF-8 to CLD2?

Original comment by dsi...@google.com on 5 Mar 2014 at 6:26

Attachments:

GoogleCodeExporter commented 9 years ago
Seems like we are still using R84. Would this explain the difference?

Original comment by kenjibaheux@chromium.org on 6 Mar 2014 at 4:19

GoogleCodeExporter commented 9 years ago
No R84 does not explain the difference. Please capture the actual bytes sent to 
CLD2. Thanks, /dick

Original comment by dsi...@google.com on 6 Mar 2014 at 9:54

GoogleCodeExporter commented 9 years ago
FWIW, I am planning to roll Chromium to the latest CLD2 in the Very Near(TM) 
future.

Original comment by andrewha...@chromium.org on 11 Mar 2014 at 12:44

GoogleCodeExporter commented 9 years ago
Re #4: please try the subject URL  http://drugoi.livejournal.com/3971967.html 
and send the requested debugging output fomr #1 if the detected language is 
Unknown. /dick

Original comment by dsi...@google.com on 11 Mar 2014 at 6:33

GoogleCodeExporter commented 9 years ago
current version of Chrome Version 38.0.2125.104 (64-bit) detects Russian and 
translates correctly. Closing as Fixed.

Original comment by dsi...@google.com on 23 Oct 2014 at 8:18