mivoq / hunpos

Automatically exported from code.google.com/p/hunpos
11 stars 7 forks source link

Fatal error: exception Failure("empty context_trie) #15

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Hi, 

I've tried to make my own POS model under LINUX. I first tried with a corpus of 
1 600 000 words (and tags) and I got a Stack Overflow error, so I tried with a 
much smaller corpus (100 000 words and tags), the program tells me it's reading 
the training corpus, then compiling probabilities, then it sends me a Fatal 
error: exception Failure("empty context_trie).

What do I do wrong, my corpus is just a file with 1 word and one tag / line 
with LF end of line.

TIA for the answer

Original issue reported on code.google.com by dwight...@gmail.com on 25 Nov 2010 at 4:09

GoogleCodeExporter commented 9 years ago
I'm having this same issue. I thought it might be a 32-bit/64-bit thing, since 
the binaries in the download section are 32-bit and I'm on a 64-bit system. But 
I still get the same error when I use binaries I have compiled on my own I get 
the same error. I've tried fiddling with the -s and -f parameters to 
hunpos-train, but it doesn't seem to help.

Original comment by arnsh...@gmail.com on 13 Jan 2011 at 10:19

GoogleCodeExporter commented 9 years ago
I would like to compile a model for Portuguese. The train corpus is utf-8 
encoded (see attachment). However, I've got the same error under Mac OS 10.6.3:
$ cat port.corpus | ./hunpos-train -t 3 -e 2 -s 3 port-model
reading training corpus
compiling probabilities
Fatal error: exception Failure("empty context_trie")

The same error occurs when I don't specify any options, using the default 
values.

Original comment by Leonel.Figueiredo.de.Alencar@gmail.com on 21 Jan 2011 at 2:07

Attachments:

GoogleCodeExporter commented 9 years ago
The Fatal error: exception Failure("empty context_trie") is caused by the 
following bug:
When building the model, a separate estimate for emission probabilities for 
words containing digits only is created and it is a fatal error if there are no 
such words in the corpus.
Some word forms composed of digits only need to be present in the training 
corpus to avoid this error.

Original comment by nova...@gmail.com on 3 Apr 2011 at 11:22