jacksonllee / pycantonese

Cantonese Linguistics and NLP
https://pycantonese.org
MIT License
354 stars 38 forks source link

hkcancor search result is empty #16

Closed g-traveller closed 6 years ago

g-traveller commented 6 years ago

`import pycantonese as pc

corpus = pc.hkcancor() print(len(corpus.words())) print(len(corpus.characters()))

aa = corpus.search(nucleus='aa') print(len(aa)) `

I tried build-in hkcancor corpus, but the result is empty:

149781 -> len(corpus.words()) 0 -> len(corpus.characters()) 0 -> len(aa)

I am using pycantonese 2.0.0 version Could you please have a look :)

jacksonllee commented 6 years ago

Hello, I'm working towards better test coverage, and the tests show that methods like words and characters of the hkcancor corpus object should work correctly (still figuring out some parsing discrepancies and python 2+3 cross compatibility):

https://github.com/pycantonese/pycantonese/blob/e9216fc4647e0dd376a1100afe25d6ddcd69c50c/pycantonese/tests/test_corpus.py#L16-L24

If you get the latest dev version from GitHub (instructions here), see if this works?

Also, may I ask what OS and python version you're using? (I myself am on Linux and Mac, no Windows. I'm on python 3.6.)

jacksonllee commented 6 years ago

Hi again, I've just configured the pycantonese repo for CI on Windows, and the tests also passed -- for both python 2.7 and 3.6: https://ci.appveyor.com/project/jacksonllee/pycantonese/build/1.0.5 So I'm hoping the latest version from GitHub should work even if one is using Windows. I'm wrapping up a few loose ends and should be making a new release on PyPI for pip install real soon.

richielo commented 6 years ago

Hi Jackson, thank you for the great work. I am having a similar issue with MacOS. I tried the line below which gives me 0 result:

machine = corpus.search(character='機')

Is this the same issue?

jacksonllee commented 6 years ago

@g-traveller @richielo I just had a deeper look, and it did look like released version 2.0.0 wasn't working as expected, while the fix had been sitting on GitHub for quite a while but not released. My apologies! I've just released the fix as version 2.1.0. So if you do:

$ pip install -U pycantonese

then you should be able to update the pycantonese installation. To verify you have the latest:

import pycantonese as pc
print(pc.__version__)  # should display 2.1.0

Please let me know if things still don't work. I'm closing this ticket for now. Thank you for taking the time to report the bug.