xingdi-eric-yuan / named-entity-recognition

A simple NER model in C++
39 stars 45 forks source link

Train by new dataset #4

Open alichq opened 7 years ago

alichq commented 7 years ago

Hi Eric, Thanks for your great work and docs. I'd like to train your CNN by my data in Persian. I formatted my data as you did on news_tagged_data.txt dataset with new tags like B-PERS, B-ORG, B-LOC ant etc. Then I made wordvecs.txt on my data used gensim and formatted my wordvecs.txt as you did. I traind CNN successfully:

Test training data: ######################################

CNN - N-Gram test result. 4809 correct of 5235 total.

Accuracy is 0.918625

###################################### Test testing data: ######################################

CNN - N-Gram test result. 1076 correct of 1155 total.

Accuracy is 0.931602

###################################### ######################################

CNN - Single word test result. 1294 correct of 1337 total.

Accuracy is 0.967838

######################################

but when I test my sentences using ./NER 4 I get wrong responses. Actually I get only "O" tags in response.

alich@Alich:~/named-entity-recognition$ ./NER 4 Type a query... (end with ) شهر رشت در کشور ایران قرار دارد و احمد میرهاشمی به آنجا سفر کرده Successfully read network information from network/info_80.xml... شهر : O رشت : O در : O کشور : O ایران : O قرار : O دارد : O و : O احمد : O میرهاشمی : O به : O آنجا : O سفر : O کرده : O Totally used time: 0.503745 second

would you please to help me?