haifengl / smile

Statistical Machine Intelligence & Learning Engine
https://haifengl.github.io
Other
5.99k stars 1.12k forks source link

of method in the smile.nlp.collocation.Bigram throws NullPointerException if the Bigram matched count is less than the count given in the top Count argument #712

Closed sudheerprem closed 2 years ago

sudheerprem commented 2 years ago

Describe the bug When the "of" method of smile.nlp.collocation.Bigram is called with Corpus, topCount and minFrequncy as arguments, and the number of matched elements are less than the topCount argument, the method throws NullPointerException

Expected behavior The number of elements in the returned Bigram should be same as the number of Bigrams matched by the topCount and minFrequncy arguements.

Actual behavior throws NullPointerException

Code snippet String content="Target could be achieved by manufacturing more electric,"

Input data "Target could be achieved by manufacturing more electric,"

Additional context

haifengl commented 2 years ago

Your minFreq is too large so that there is no bigram passing the threshold. I add some safeguard.

BTW, this is for corpus (i.e. many text). You have only one and set very large topCount and minFreq.