redserpent7 / cld2

Automatically exported from code.google.com/p/cld2
0 stars 0 forks source link

Enable dynamic mode #6

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
As discussed offline, this is a patch to enable CLD2 to run in "dynamic" mode. 
In dynamic mode the kScoringtables struct is populated from a file at runtime 
instead of being compiled into the program as a read-only section in the binary.

This patch adds a new cld2_dynamic_data_tool and accompanying build 
instructions, and patches the unit tests to exercise all dynamic functionality. 
Data can be loaded, unloaded, and reloaded - theoretically allowing continuous 
operations of the program when updated tables are available.

It still has some hardcoding, but we can fix the underlying issues in the 
source code easily in another pass as you've suggested.

Original issue reported on code.google.com by andrewha...@google.com on 25 Feb 2014 at 6:30

Attachments:

GoogleCodeExporter commented 9 years ago
For posterity you can see the Chromium side of things here:
https://codereview.chromium.org/169853003/

Original comment by andrewha...@google.com on 25 Feb 2014 at 6:46

GoogleCodeExporter commented 9 years ago
I should also probably note that there were a few changes needed to move from 
the old Chromium CLD2 to the latest, so there are some discrepancies between 
the Chromium code review and this patch. They are trivial, however, and consist 
of:

compile_dynamic.sh:
Uses the latest 0122 tables, and drops the suffix 0715/0122/etc from all 
generated binaries it creates

cld2_dynamic_*.cc:
Use 7 tables instead of 6, accommodating the new quad_obj2. Output the 
theoretical names of the tables when outputting headers via --head

compact_lang_det_impl.cc:
Same as above, use 7 tables instead of 6 in dynamic mode

Original comment by andrewha...@google.com on 25 Feb 2014 at 6:49

GoogleCodeExporter commented 9 years ago
I just noticed there's a new use of kScoringtables in compact_lang_det_impl.cc:

https://code.google.com/p/cld2/source/browse/trunk/internal/compact_lang_det_imp
l.cc#1976

// Return version text string
// String is "code_version - data_build_date"
const char* DetectLanguageVersion() {
  if (kScoringtables.quadgram_obj == NULL) {return "";}
  sprintf(temp_detectlanguageversion,
          "V2.0 - %u", kScoringtables.quadgram_obj->kCLDTableBuildDate);
  return temp_detectlanguageversion;
}

The way this code is written, it SHOULD be compatible with the dynamic chages; 
the kScoringtables.quadgram_obj reference should indeed be initialized to null 
in dynamic data mode, and that will hit the first if-block and simply return 
"". If this is used anywhere critical, that location may need to be patched to 
check for isDataLoaded() first. I don't expect this to be a problem, however, 
as any code switching to using CLD2 in dynamic mode should be paying close 
attention to the data loading work anyways.

Original comment by andrewha...@google.com on 26 Feb 2014 at 9:51

GoogleCodeExporter commented 9 years ago
Patch committed as r151: https://code.google.com/p/cld2/source/detail?r=151

Original comment by andrewha...@google.com on 3 Mar 2014 at 3:22