make check failed on DB records

karlvlam commented 11 years ago

Hi wanleung,

I've tried to compile libcangjie on Ubuntu 12.04 64-bit, and the make check test case cannot passed.

Error log:

make  check-am
make[1]: Entering directory `/u1/karl/mp/libcangjie'
make  check-TESTS
make[2]: Entering directory `/u1/karl/mp/libcangjie'
Checking tables/cj3-sc.txt...
Table contained 7191 entries but DB contains 7360
FAIL: tests/testdbs.pl
==============================================================
1 of 1 test failed
Please report to https://github.com/wanleung/libcangjie/issues
==============================================================
make[2]: *** [check-TESTS] Error 1
make[2]: Leaving directory `/u1/karl/mp/libcangjie'
make[1]: *** [check-am] Error 2
make[1]: Leaving directory `/u1/karl/mp/libcangjie'
make: *** [check] Error 2

Here's my packages installed:

libdb++-dev 5.1.4ubuntu1
libdb5.1       5.1.25-11build 
libdb5.1++     5.1.25-11build 
libdb5.1++-dev 5.1.25-11build 
libdb5.1-dev   5.1.25-11build 
libberkeleydb-perl 0.49-1

bochecha commented 11 years ago

First, when pasting the error log, please do it as a code block (see "Fenced code blocks" at https://help.github.com/articles/github-flavored-markdown). It would avoid these huge titles in the middle of the log, and make it more readable through the use of monospace font.

But don't worry about it, it's just a note for your future bug reports. :)

Second, I'd like to warn you about a potential trouble you might run into. You are building libcangjie on Ubuntu 12.04, and I can't help but wonder whether your goal will be to build pycangjie and ibus-cangjie.

Unfortunately, you won't be able to build pycangjie on Ubuntu 12.04, because their version of Cython is too old. That is, unless you also build a newer Cython first. ;)

Now, to your issue.

This is very weird, the log says that cj3-sc.mb contains more entries than were originally in the cj3-sc.txt file. :-/

Could you upload these files somewhere? I'd like to do some testing on it.

wanleung commented 11 years ago

Sorry, This is my fault. After adding some new features, I forgot to change the test script.

more record is right, as it is now added japanese alphabets. It will use the same method to handle punctuation afterward.

And I will open a new branch to freeze the code and pls use that branch to sumit bug fix.

thx.

regards, wanleung

On Tue, Mar 12, 2013 at 10:52 AM, Mathieu Bridon notifications@github.comwrote:

First, when pasting the error log, please do it as a code block (see "Fenced code blocks" at https://help.github.com/articles/github-flavored-markdown). It would avoid these huge titles in the middle of the log, and make it more readable through the use of monospace font.

But don't worry about it, it's just a note for your future bug reports. :)

Second, I'd like to warn you about a potential trouble you might run into. You are building libcangjie on Ubuntu 12.04, and I can't help but wonder whether your goal will be to build pycangjie and ibus-cangjie.

Unfortunately, you won't be able to build pycangjie on Ubuntu 12.04, because their version of Cython is too old. That is, unless you also

build a newer Cython first. ;)

Now, to your issue.

This is very weird, the log says that cj3-sc.db contains more entries than were originally in the cj3-sc.txt file. :-/

Could you upload the db file somewhere? I'd like to do some testing on it.

— Reply to this email directly or view it on GitHubhttps://github.com/wanleung/libcangjie/issues/18#issuecomment-14755909 .

wanleung commented 11 years ago

Bug Fixed. Pls git pull

regards, wanleung

On Tue, Mar 12, 2013 at 1:07 PM, Wan Leung Wong wanleungwong@gmail.comwrote:

Sorry, This is my fault. After adding some new features, I forgot to change the test script.

more record is right, as it is now added japanese alphabets. It will use the same method to handle punctuation afterward.

And I will open a new branch to freeze the code and pls use that branch to sumit bug fix.

thx.

regards, wanleung

On Tue, Mar 12, 2013 at 10:52 AM, Mathieu Bridon <notifications@github.com

wrote:

First, when pasting the error log, please do it as a code block (see "Fenced code blocks" at https://help.github.com/articles/github-flavored-markdown). It would avoid these huge titles in the middle of the log, and make it more readable through the use of monospace font.

But don't worry about it, it's just a note for your future bug reports. :)

Second, I'd like to warn you about a potential trouble you might run into. You are building libcangjie on Ubuntu 12.04, and I can't help but wonder whether your goal will be to build pycangjie and ibus-cangjie.

Unfortunately, you won't be able to build pycangjie on Ubuntu 12.04, because their version of Cython is too old. That is, unless you also

build a newer Cython first. ;)

Now, to your issue.

This is very weird, the log says that cj3-sc.db contains more entries than were originally in the cj3-sc.txt file. :-/

Could you upload the db file somewhere? I'd like to do some testing on it.

— Reply to this email directly or view it on GitHubhttps://github.com/wanleung/libcangjie/issues/18#issuecomment-14755909 .

bochecha commented 11 years ago

"Sorry, This is my fault. After adding some new features, I forgot to change the test script. more record is right, as it is now added japanese alphabets. It will use the same method to handle punctuation afterward.

His test fails on cj3-sc.txt, why would that contain Japanese characters?

bochecha commented 11 years ago

Wait, did you add the Japanese characters to all generated DB? o_O

https://github.com/wanleung/libcangjie/commit/c6d92e1d2dce532049a2d4805d27430c5a04559d#L0R96

Why would you do that?

wanleung commented 11 years ago

The table not only contains chinese characters, but also contain punctuation and japanese alphabets(a subset of japanese characters).

I was trying to separate them out to make the tables more clean. 1st is japanese alphabets because different tables have its own japanese alphabets code....

regards, wanleung

On Tue, Mar 12, 2013 at 2:13 PM, Mathieu Bridon notifications@github.comwrote:

"Sorry, This is my fault. After adding some new features, I forgot to change the test script. more record is right, as it is now added japanese alphabets. It will use the same method to handle punctuation afterward.

His test fails on cj3-sc.txt, why would that contain Japanese characters?

— Reply to this email directly or view it on GitHubhttps://github.com/wanleung/libcangjie/issues/18#issuecomment-14760173 .

wanleung commented 11 years ago

The bug is that, my test script haven't check on the new separate table and cause the count error.

On Tue, Mar 12, 2013 at 2:20 PM, Wan Leung Wong wanleungwong@gmail.comwrote:

The table not only contains chinese characters, but also contain punctuation and japanese alphabets(a subset of japanese characters).

I was trying to separate them out to make the tables more clean. 1st is japanese alphabets because different tables have its own japanese alphabets code....

regards, wanleung

On Tue, Mar 12, 2013 at 2:13 PM, Mathieu Bridon notifications@github.comwrote:

"Sorry, This is my fault. After adding some new features, I forgot to change the test script. more record is right, as it is now added japanese alphabets. It will use the same method to handle punctuation afterward.

His test fails on cj3-sc.txt, why would that contain Japanese characters?

— Reply to this email directly or view it on GitHubhttps://github.com/wanleung/libcangjie/issues/18#issuecomment-14760173 .

wanleung commented 11 years ago

It is easy to use cangjie code to type japanese. Quite a lot HK people have to type some japanese alphabet for their searching. The old cangjie table did not give you a good method to type japanese alphabet although some tables include japanese alphabet code. And for those programmers who have to type japanese alphabet, they will edit their own table to support japanese alphabet in cangjie, and I just add the code back.

The japanese function will later connect to anthy lib to support japanese volcab searching on next milestone.

regards, wanleung

On Tue, Mar 12, 2013 at 2:22 PM, Wan Leung Wong wanleungwong@gmail.comwrote:

The bug is that, my test script haven't check on the new separate table and cause the count error.

On Tue, Mar 12, 2013 at 2:20 PM, Wan Leung Wong wanleungwong@gmail.comwrote:

The table not only contains chinese characters, but also contain punctuation and japanese alphabets(a subset of japanese characters).

I was trying to separate them out to make the tables more clean. 1st is japanese alphabets because different tables have its own japanese alphabets code....

regards, wanleung

On Tue, Mar 12, 2013 at 2:13 PM, Mathieu Bridon <notifications@github.com

wrote:

"Sorry, This is my fault. After adding some new features, I forgot to change the test script. more record is right, as it is now added japanese alphabets. It will use the same method to handle punctuation afterward.

His test fails on cj3-sc.txt, why would that contain Japanese characters?

— Reply to this email directly or view it on GitHubhttps://github.com/wanleung/libcangjie/issues/18#issuecomment-14760173 .

bochecha commented 11 years ago

I don't think the bug is in the test script.

I mean, the line I linked to in my previous comment clearly show that you add the content of tables/jp.txt to every db we build.

I don't understand this.

Why would you want to put Japanese characters in e.g data/cj3-tc.mb? That db should only contain Traditional Chinese.

bochecha commented 11 years ago

"It is easy to use cangjie code to type japanese."

Right, but then the Japanese characters should be in the tables/cj3-cjk.txt, shouldn't they?

I mean, that's the whole point of having separate tables for languages, and then letting the user decide which ones to use, thanks to these flags: https://github.com/wanleung/libcangjie/blob/master/src/cangjie.h#L25

wanleung commented 11 years ago

The problem is that, all our tables have different code in japanese alphabet code. As I have to do backward compatible, so the old code still keeping in the tables, but there is no point to have different code support on different table. why don't just pull it out and unify it. The problem is same as the punctuation. all table has its own punctuation code.........

On Tue, Mar 12, 2013 at 2:44 PM, Wan Leung Wong wanleungwong@gmail.comwrote:

Yes, it is the bug and I fixed it.

Japanese alphabet is different from Japanese Kanji.

You don't understand it because you are not using chinese input, and so you haven't facing any problem that we met before and which all solved by ourselves. :-P

As the Japanese culture affect HK so much (same as Taiwan). We sometimes have to type Japanese alphabet. The problem is that, before UTF8, as the BIG5 table didn't include the japanese alphabet, we all have to install a lib called Sakura, to import the Japanese alphabet and changed our input method to support Japanese alphabet.

Although we are now in Unicode we no longer to import the Japanese alphabet charset, this kind of input support is still exists in GCIN.

As we are doing a good input method for HK people, at least for me, I am doing something is easy for HKer to input the thing we usually use, which japanese alphabet support is a thing that I want to do.

regards, wanleung

On Tue, Mar 12, 2013 at 2:31 PM, Mathieu Bridon notifications@github.comwrote:

I don't think the bug is in the test script.

I mean, the line I linked to in my previous comment clearly show that you add the content of tables/jp.txt to every db we build.

I don't understand this.

Why would you want to put Japanese characters in e.g data/cj3-tc.mb? That db should only contain Traditional Chinese.

— Reply to this email directly or view it on GitHubhttps://github.com/wanleung/libcangjie/issues/18#issuecomment-14760602 .

wanleung commented 11 years ago

Fixed

wanleung / libcangjie

make check failed on DB records #18

build a newer Cython first. ;)

build a newer Cython first. ;)