Source codes and corpora of paper "Iterated Dilated Convolutions for Chinese Word Segmentation" published in NNW journal.
It implements the following 4
models for CWS:
Both CPU and GPU are supported. GPU training is 10
times faster.
Run following script to convert corpus to TensorFlow dataset.
$ ./scripts/make.sh
$ ./scripts/run.sh $dataset $model
$dataset
can be pku
, msr
, asSC
or cityuSC
. $model
can be cnn
or bilstm
.For example:
$ ./scripts/run.sh pku cnn
It will train a cnn
model on pku
dataset, then evaluate performance on test set.
To enable CRF layer, simply append --viterbi
to your command, e.g.
$ ./scripts/run.sh pku cnn --viterbi