jiesutd / NCRFpp

NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components.
Apache License 2.0
1.89k stars 446 forks source link

关于 CNN_BILSTM_CRF model的一些问题 #21

Closed hit-joseph closed 6 years ago

hit-joseph commented 6 years ago

先膜拜大佬: 我想把这个模型用在一个中文的序列标注问题上:这里面有POS的标记:这个和CNN_character的特征冲突吗,你的项目里面是手动标记特征和CNN_character的特征可以共存吗?另外看了一下数据的预处理的格式:Friday [Cap]1 [POS]NNP O ,我只用到了POS的特征数据是不是应该写成Friday [POS]NNP O ,[POS]是必须要的吗,还是你只是作为一个标记? 烦请指教

jiesutd commented 6 years ago
  1. Character CNN/LSTM是可以和自定义feature一起用的,不冲突。 2.你是对的,如果不用某个feature,你的输入文件里面就不需要那一列。并不是所有的都需要的。
hit-joseph commented 6 years ago

one more question,in the file named demo.train.config ,there are some Hyperparameters: cnn_layer=4 char_hidden_dim=30 I can't really uderstand what those 2 parameters means, do you mind explain it? Thanks

jiesutd commented 6 years ago

You may refer the configuration explanation here: https://github.com/jiesutd/NCRFpp/blob/master/readme/Configuration.md

hit-joseph commented 6 years ago

thank you .this file I have read already , so I JUST want to make it sure, if I use lstm as word-sequence layer, cnn_layer is useless and can be annotated, and the parameter of char_hidden_dim=50 means the feature i extract from word and dim is 50 ,and than joint it after word2vec and pos_vec?

jiesutd commented 6 years ago

Do you mean "ignored" rather than "annotated"? Yes, if you choose the LSTM to encode the word sequence then the settings of CNN can be ignored.

About the char_hidden_dim=50: "the feature i extract from word and dim is 50", this is right. "joint it after word2vec and pos_vec", it is concatenated with word embeddings(not word2vec) and feature embeddings.

hit-joseph commented 6 years ago

if I SET char_emb_dim=100 (this parameter in I/O part) char_hidden_dim=50(this parameter in Hyperparameters part) is that means i input the pretrained char_embedding which in 100 dim ,and after cnn_char_layer I GET char_features_embdding in 50dim?and it can concatenate with word embedding?

jiesutd commented 6 years ago

exactly.

hit-joseph commented 6 years ago

thank you very much!