Cleanup, and big refactoring of the frequency code

bochecha commented 11 years ago

One thing has been bothering me for a while: we put the frequency information into a different database from the characters.

As such, at run time, we need to first find the character(s) corresponding to the input code, then for each one of them, create a ChChar, then for each one of them, find the corresponding frequency in the frequency db.

That's just way too many iterations.

What my refactoring does is put the frequency information at build time into the same db as the Chinese characters.

And then, at run time, we get everything from only one database, and iterate only once to create the ChChar.

Things should be faster, the code is much simpler.

$ git diff --stat wanleung/master
[... snip ...]
 7 files changed, 118 insertions(+), 210 deletions(-)

And all that without breaking the public API! \o/

mahiuchun commented 11 years ago

LGTM

bochecha commented 11 years ago

As discussed with @wanleung this week-end, I've just removed the commits which merged the frequency into the same database.

The reason is that this code is temporary anyway, long term @wanleung wants to think harder about how to store all this data properly so it is more efficient to retrieve it (frequency, type, char, etc...).

We're getting close to a first release, so we agreed to delay this whole refactoring.

So I'm only keeping in the pull request the commits which did some cleanup, because that's pretty uncontroversial and it does a bit of good.

@wanleung, whenever you want to merge. :)

wanleung / libcangjie

Cleanup, and big refactoring of the frequency code #17