mayabot / mynlp

一个生产级、高性能、模块化、可扩展的中文NLP工具包。(中文分词、平均感知机、fastText、拼音、新词发现、分词纠错、BM25、人名识别、命名实体、自定义词典)
https://mynlp.mayabot.com/
Apache License 2.0
675 stars 90 forks source link

请问mynlp-classification 训练时文件大小有限制?训练文本较大的时候出现Illegal Capacity: -1异常 #23

Open 1780spark opened 4 years ago

1780spark commented 4 years ago

【问题描述如下】:

使用的训练文本有356795行,有150MB,运行时候出现一下异常?

Read file build dictionary ... Read 6M words

Number of words: 95303 Number of labels: 0 Number of wordHash2Id: 127070 Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.simontuffs.onejar.Boot.run(Boot.java:340) at com.simontuffs.onejar.Boot.main(Boot.java:166) Caused by: java.lang.IllegalArgumentException: Illegal Capacity: -1 at java.util.ArrayList.(ArrayList.java:157) at com.mayabot.nlp.fasttext.loss.HierarchicalSoftmaxLoss.(HierarchicalSoftmaxLoss.kt:29) at com.mayabot.nlp.fasttext.loss.LossKt.createLoss(Loss.kt:36) at com.mayabot.nlp.fasttext.FastText$Companion.train(FastText.kt:509) at com.mayabot.nlp.fasttext.FastText$Companion.train(FastText.kt:481) at com.mayabot.nlp.fasttext.FastText$Companion.trainSupervised(FastText.kt:435) at com.mayabot.nlp.fasttext.FastText.trainSupervised(FastText.kt)

请问mynlp-classification 训练时文件大小有限制导致吗?

jimichan commented 4 years ago

不太对劲

Number of labels: 0 标签数量为0 你gei几行数据看看你的格式对不对