Closed curusarn closed 7 years ago
The error means that NameTag thinks word 0000000
is in multiple clusters (and fails because it does not know which cluster to use).
The input file for the BrownCluster feature should contain lines with cluster<tab>lemma
-- don't you have it reversed (i.e., lemma<tab>cluster
)? The "form 0000000" looks more like a cluster.
If you really have one form present multiple times in the file, you have to decide which one to use yourself.
The BrownCluster feature file was reversed (lemma
Thanks for your time.
When I run train_ner with
BrownClusters
feature enabled I get following output:Why exactly can't be
Form
'0000000' present twice?It seems like a unnecessary limitation. As far as I know all words with the same prefix belong into one cluster. Therefore any additional bits after chosen prefix are irrelevant. (Eg. with prefix of length 20 any bits after 20th bit are irrelevant.) Am I missing something?
Best regards. Simon Let