of method in the smile.nlp.collocation.Bigram throws NullPointerException if the Bigram matched count is less than the count given in the top Count argument #712
Describe the bug
When the "of" method of smile.nlp.collocation.Bigram is called with Corpus, topCount and minFrequncy as arguments, and the number of matched elements are less than the topCount argument, the method throws NullPointerException
Expected behavior
The number of elements in the returned Bigram should be same as the number of Bigrams matched by the topCount and minFrequncy arguements.
Actual behavior
throws NullPointerException
Code snippet
String content="Target could be achieved by manufacturing more electric,"
" hybrid cars Proposed rule could have \"significant impact\" on Europe's automakers: "
"industry BMW Group, Daimler, and other carmakers in the European Union would be "
"required to improve the fuel economy of their vehicles or increase the proportion "
"of electric cars they produce to meet 2030 carbon dioxide reduction targets proposed"
" by the European Commission Nov. 8. The commission, the EU's executive arm, said the "
"average private car or light van sold in the EU in 2030 should emit 30 percent less "
"carbon dioxide than a car or van sold in 2021--a level that European automakers say "
"is too ambitious. Under existing binding targets, private cars' carbon dioxide emissions"
" in 2021 are capped at 95 grams of carbon dioxide per kilometer traveled, while for light"
" vans the 2021 limit is 147 grams per kilometer. The commission's proposal was \"very aggressive\" "
"and the 2030 target should instead be a 20 percent reduction in average vehicle emissions, "
"Erik Jonnaert, secretary general of the European Automobile Manufacturers' Association, "
"said in a statement Nov. 8. The group speaks for carmakers including BMW Group, Daimler,"
"Fiat Chrysler Automobiles, and Renault Group. The proposed regulation could have \"a significant "
"impact on the future of Europe's automotive industry,\" because it would only succeed if a "
"significant switch is made to alternatively fueled vehicles, Jonnaert said. The commission's "
"proposed targets are in a draft EU regulation, which must be debated and agreed to by the European "
"Parliament and EU member countries before taking effect. Targets Measured Compliance will be "
"measured by calculating average emissions of new vehicles sold per manufacturer, "
"but manufacturers could group together so that average emissions would be calculated "
"across their combined vehicle fleets. The commission also said an interim target of a "
"15 percent carbon-dioxide reduction by 2025 should be adopted.";
Normalizer normalizer = SimpleNormalizer.getInstance();
SimpleCorpus corpus = new SimpleCorpus();
String normalizedcontent = normalizer.normalize(content);
corpus.add(new Text(normalizedcontent));
Bigram[] bigrams = Bigram.of(corpus, 10, 5) ;
System.out.println("Bigrams :"+Arrays.toString(bigrams));
Input data
"Target could be achieved by manufacturing more electric,"
" hybrid cars Proposed rule could have \"significant impact\" on Europe's automakers: "
"industry BMW Group, Daimler, and other carmakers in the European Union would be "
"required to improve the fuel economy of their vehicles or increase the proportion "
"of electric cars they produce to meet 2030 carbon dioxide reduction targets proposed"
" by the European Commission Nov. 8. The commission, the EU's executive arm, said the "
"average private car or light van sold in the EU in 2030 should emit 30 percent less "
"carbon dioxide than a car or van sold in 2021--a level that European automakers say "
"is too ambitious. Under existing binding targets, private cars' carbon dioxide emissions"
" in 2021 are capped at 95 grams of carbon dioxide per kilometer traveled, while for light"
" vans the 2021 limit is 147 grams per kilometer. The commission's proposal was \"very aggressive\" "
"and the 2030 target should instead be a 20 percent reduction in average vehicle emissions, "
"Erik Jonnaert, secretary general of the European Automobile Manufacturers' Association, "
"said in a statement Nov. 8. The group speaks for carmakers including BMW Group, Daimler,"
"Fiat Chrysler Automobiles, and Renault Group. The proposed regulation could have \"a significant "
"impact on the future of Europe's automotive industry,\" because it would only succeed if a "
"significant switch is made to alternatively fueled vehicles, Jonnaert said. The commission's "
"proposed targets are in a draft EU regulation, which must be debated and agreed to by the European "
"Parliament and EU member countries before taking effect. Targets Measured Compliance will be "
"measured by calculating average emissions of new vehicles sold per manufacturer, "
"but manufacturers could group together so that average emissions would be calculated "
"across their combined vehicle fleets. The commission also said an interim target of a "
"15 percent carbon-dioxide reduction by 2025 should be adopted."
Additional context
What Java (OpenJDK, Orack JDK, etc.) are you using and which Java version
Happens on both OpenJDK and Oracle JDK
Which Smile version
2.6.0
What is your build system (e.g. Ubuntu, MacOS, Windows, Debian )
Happens on both Windows and MacOS
Add any other context about the problem here.
The root cause of the problem is the initial bigram array is setting with topCount causing all the elements set to null initially. This needs to be reset with the number of matched cases after finding all the Bigrams.
Describe the bug When the "of" method of smile.nlp.collocation.Bigram is called with Corpus, topCount and minFrequncy as arguments, and the number of matched elements are less than the topCount argument, the method throws NullPointerException
Expected behavior The number of elements in the returned Bigram should be same as the number of Bigrams matched by the topCount and minFrequncy arguements.
Actual behavior throws NullPointerException
Code snippet String content="Target could be achieved by manufacturing more electric,"
Input data "Target could be achieved by manufacturing more electric,"
Additional context