The tagging performance of DRAGNN is worse than SyntaxNet.

banyh commented 7 years ago

System information

the top-level directory of the model: /home/banyhong/syntaxnet_wrapper/syntaxnet_wrapper/models/syntaxnet
OS Platform: Linux Ubuntu 14.04
TensorFlow binary: Linux CPU version, 1.1.0-rc1
TensorFlow version: ('v1.1.0-rc1-168-g0054c39', '1.1.0-rc1')
Bazel version: 0.4.3

Describe the problem

I've modified dragnn/tools/parse-to-conll.py to make it printint out token.tag.

The sentence "Alice drove down the street in her car" has been parsed by SyntaxNet tagger+parser, DRAGNN parser. The pos-tag of Alice is NOUN++NNP in SyntaxNet, but is ADV++RB in DRAGNN.

Source code / logs

Modified parse-to-conll.py line 227 to 231:

            f.write('%s\t%s\t_\t_\t_\t_\t%d\t%s\t_\t%s\n'%(
                i + 1,
                token.word.encode('utf-8'), head,
                token.label.encode('utf-8'),
                token.tag.encode('utf-8')))

To get DRAGNN parser result:

bazel --output_user_root=bazel_root run -c opt //dragnn/tools:parse-to-conll -- \
    --parser_master_spec=/home/banyhong/syntaxnet_wrapper/syntaxnet_wrapper/models/syntaxnet/dragnn/conll17/English/parser_spec.textproto \
    --parser_checkpoint_file=/home/banyhong/syntaxnet_wrapper/syntaxnet_wrapper/models/syntaxnet/dragnn/conll17/English/checkpoint \
    --parser_resource_dir=/home/banyhong/syntaxnet_wrapper/syntaxnet_wrapper/models/syntaxnet/dragnn/conll17/English \
    --use_gold_segmentation=True \
    --input_file=/home/banyhong/syntaxnet_wrapper/syntaxnet_wrapper/models/syntaxnet/input.conll \
    --inference_beam_size=char_lstm=16,lookahead=16,tagger=64,parser=64 \
    --output_file=/home/banyhong/syntaxnet_wrapper/syntaxnet_wrapper/models/syntaxnet/output.conll

The content in output.conll is:

#Alice drove down the street in her car
1       Alice   _       _       _       _       2       nsubj   _       attribute { name: "fPOS" value: "ADV++RB" }
2       drove   _       _       _       _       0       root    _       attribute { name: "Mood" value: "Imp" } attribute { name: "VerbForm" value: "Fin" } attribute { name: "fPOS" value: "VERB++VB" }
3       down    _       _       _       _       2       compound:prt    _       attribute { name: "fPOS" value: "ADP++RP" }
4       the     _       _       _       _       5       det     _       attribute { name: "Definite" value: "Def" } attribute { name: "PronType" value: "Art" } attribute { name: "fPOS" value: "DET++DT" }
5       street  _       _       _       _       2       obj     _       attribute { name: "Number" value: "Sing" } attribute { name: "fPOS" value: "NOUN++NN" }
6       in      _       _       _       _       8       case    _       attribute { name: "fPOS" value: "ADP++IN" }
7       her     _       _       _       _       8       nmod:poss       _       attribute { name: "Gender" value: "Fem" } attribute { name: "Number" value: "Sing" } attribute { name: "Person" value: "3" } attribute { name: "Poss" value: "Yes" } attribute { name: "PronType" value: "Prs" } attribute { name: "fPOS" value: "PRON++PRP$" }
8       car     _       _       _       _       5       nmod    _       attribute { name: "Number" value: "Sing" } attribute { name: "fPOS" value: "NOUN++NN" }

The parsing result of SyntaxNet is:

1       Alice   _       NOUN    NNP     _       2       nsubj   _       _
2       drove   _       VERB    VBD     _       0       ROOT    _       _
3       down    _       ADP     IN      _       2       prep    _       _
4       the     _       DET     DT      _       5       det     _       _
5       street  _       NOUN    NN      _       3       pobj    _       _
6       in      _       ADP     IN      _       2       prep    _       _
7       her     _       PRON    PRP$    _       8       poss    _       _
8       car     _       NOUN    NN      _       6       pobj    _       _

bogatyy commented 7 years ago

Could you clarify the language and the dataset that you evaluated POS tagging on? Or do you mean tagging is worse on that specific sentence?

banyh commented 7 years ago

Here is the information:

Language: English
SyntaxNet model: parsey_mcparseface
DragNN model: English
Tag system: Penn Tree Bank POS tags
Dataset: ud-treebanks-conll2017.tgz
- using file ud-treebanks-conll2017/UD_English/en-ud-dev.conllu
- 2002 sentences in total
- 25148 tags in total
SyntaxNet accuracy: 0.9303
DragNN accuracy: 0.9217

bogatyy commented 7 years ago

Oh, that is because parsey_mcparseface was trained on a significantly larger dataset. Further, it was optimized to maximize both POS tagging accuracy and parsing accuracy. On the other hand, our baseline models for CoNLL 2017 (including the English model you looked at) could only be trained on UD data and were optimized only for parsing accuracy. To learn more: http://universaldependencies.org/conll17/evaluation.html

tensorflow / models