mimno / Mallet

MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.
https://mimno.github.io/Mallet/
Other
973 stars 346 forks source link

java.lang.NullPointerException #160

Open shabir1 opened 5 years ago

shabir1 commented 5 years ago

data sample: Document1 label1 forest=3.4 tree=5 wood=2.85 hammer=1 colour=1 leaf=1.5 Document2 label2 forest=10 tree=5 wood=2.75 hammer=1 colour=4 leaf=1

String lineRegex = "^(\S)[\s,](\S)[\s,](.)$"; String dataRegex = "[\p{L}([0-9]\.[0-9]+|[0-9]+)_\=]+";
ArrayList pipeList = new ArrayList();
pipeList.add(new Target2Label()); pipeList.add( new Input2CharSequence() ); pipeList.add( new CharSequence2TokenSequence(Pattern.compile(dataRegex)) ); pipeList.add( new TokenSequenceParseFeatureString(true,true,"=") ); pipeList.add( new PrintInputAndTarget()); InstanceList instances = new InstanceList (new SerialPipes(pipeList)); Reader fileReader = new InputStreamReader(new FileInputStream(new File(dataPath)), "UTF-8"); instances.addThruPipe(new CsvIterator (fileReader, Pattern.compile(lineRegex), 3, 2, 1));

ClassifierTrainer trainClassify = new NaiveBayesTrainer();
trainClassify.train(instances);

. . . . name: 1419 target: +adwapq-50k input: TokenSequence [CapitalGain=0.0 span[0..15], education=5 feature(education)=5.0 span[16..27], occupation=0 span[28..40], race=0 span[41..47], sex=1 feature(sex)=1.0 span[48..53], capitalLoss=0.0 span[54..69], HoursPerWeek=40.0 feature(HoursPerWeek)=40.0 span[70..87], fnlwgt=115070.0 feature(fnlwgt)=115070.0 span[88..103], MaritalStatus=0 span[104..119], NativeCountry=0 span[120..135], workclass=2 feature(workclass)=2.0 span[136..147], relationship=0 span[148..162], age=47.0 feature(age)=47.0 span[163..171], EducationNum=10.0 feature(EducationNum)=10.0 span[172..189]] Token#0:CapitalGain=0.0 span[0..15] Token#1:education=5 feature(education)=5.0 span[16..27] Token#2:occupation=0 span[28..40] Token#3:race=0 span[41..47] Token#4:sex=1 feature(sex)=1.0 span[48..53] Token#5:capitalLoss=0.0 span[54..69] Token#6:HoursPerWeek=40.0 feature(HoursPerWeek)=40.0 span[70..87] Token#7:fnlwgt=115070.0 feature(fnlwgt)=115070.0 span[88..103] Token#8:MaritalStatus=0 span[104..119] Token#9:NativeCountry=0 span[120..135] Token#10:workclass=2 feature(workclass)=2.0 span[136..147] Token#11:relationship=0 span[148..162] Token#12:age=47.0 feature(age)=47.0 span[163..171] Token#13:EducationNum=10.0 feature(EducationNum)=10.0 span[172..189]

name: 1420 target: +adwapq-50k input: TokenSequence [CapitalGain=0.0 span[0..15], education=5 feature(education)=5.0 span[16..27], occupation=11 feature(occupation)=11.0 span[28..41], race=0 span[42..48], sex=0 span[49..54], capitalLoss=0.0 span[55..70], HoursPerWeek=50.0 feature(HoursPerWeek)=50.0 span[71..88], fnlwgt=172582.0 feature(fnlwgt)=172582.0 span[89..104], MaritalStatus=0 span[105..120], NativeCountry=0 span[121..136], workclass=5 feature(workclass)=5.0 span[137..148], relationship=3 feature(relationship)=3.0 span[149..163], age=19.0 feature(age)=19.0 span[164..172], EducationNum=10.0 feature(EducationNum)=10.0 span[173..190]] Token#0:CapitalGain=0.0 span[0..15] Token#1:education=5 feature(education)=5.0 span[16..27] Token#2:occupation=11 feature(occupation)=11.0 span[28..41] Token#3:race=0 span[42..48] Token#4:sex=0 span[49..54] Token#5:capitalLoss=0.0 span[55..70] Token#6:HoursPerWeek=50.0 feature(HoursPerWeek)=50.0 span[71..88] Token#7:fnlwgt=172582.0 feature(fnlwgt)=172582.0 span[89..104] Token#8:MaritalStatus=0 span[105..120] Token#9:NativeCountry=0 span[121..136] Token#10:workclass=5 feature(workclass)=5.0 span[137..148] Token#11:relationship=3 feature(relationship)=3.0 span[149..163] Token#12:age=19.0 feature(age)=19.0 span[164..172] Token#13:EducationNum=10.0 feature(EducationNum)=10.0 span[173..190]

java.lang.NullPointerException at cc.mallet.types.Multinomial$Estimator.setAlphabet(Multinomial.java:308) at cc.mallet.classify.NaiveBayesTrainer.setup(NaiveBayesTrainer.java:251) at cc.mallet.classify.NaiveBayesTrainer.trainIncremental(NaiveBayesTrainer.java:200) at cc.mallet.classify.NaiveBayesTrainer.train(NaiveBayesTrainer.java:193) at cc.mallet.classify.NaiveBayesTrainer.train(NaiveBayesTrainer.java:59)