tensorflow / models

Models and examples built with TensorFlow
Other
77.24k stars 45.75k forks source link

The tagging performance of DRAGNN is worse than SyntaxNet. #1347

Closed banyh closed 7 years ago

banyh commented 7 years ago

System information

Describe the problem

The tagging performance of DRAGNN is worse than SyntaxNet.

I've modified dragnn/tools/parse-to-conll.py to make it printint out token.tag.

The sentence "Alice drove down the street in her car" has been parsed by SyntaxNet tagger+parser, DRAGNN parser. The pos-tag of Alice is NOUN++NNP in SyntaxNet, but is ADV++RB in DRAGNN.

Source code / logs

Modified parse-to-conll.py line 227 to 231:

            f.write('%s\t%s\t_\t_\t_\t_\t%d\t%s\t_\t%s\n'%(
                i + 1,
                token.word.encode('utf-8'), head,
                token.label.encode('utf-8'),
                token.tag.encode('utf-8')))

To get DRAGNN parser result:

bazel --output_user_root=bazel_root run -c opt //dragnn/tools:parse-to-conll -- \
    --parser_master_spec=/home/banyhong/syntaxnet_wrapper/syntaxnet_wrapper/models/syntaxnet/dragnn/conll17/English/parser_spec.textproto \
    --parser_checkpoint_file=/home/banyhong/syntaxnet_wrapper/syntaxnet_wrapper/models/syntaxnet/dragnn/conll17/English/checkpoint \
    --parser_resource_dir=/home/banyhong/syntaxnet_wrapper/syntaxnet_wrapper/models/syntaxnet/dragnn/conll17/English \
    --use_gold_segmentation=True \
    --input_file=/home/banyhong/syntaxnet_wrapper/syntaxnet_wrapper/models/syntaxnet/input.conll \
    --inference_beam_size=char_lstm=16,lookahead=16,tagger=64,parser=64 \
    --output_file=/home/banyhong/syntaxnet_wrapper/syntaxnet_wrapper/models/syntaxnet/output.conll

The content in output.conll is:

#Alice drove down the street in her car
1       Alice   _       _       _       _       2       nsubj   _       attribute { name: "fPOS" value: "ADV++RB" }
2       drove   _       _       _       _       0       root    _       attribute { name: "Mood" value: "Imp" } attribute { name: "VerbForm" value: "Fin" } attribute { name: "fPOS" value: "VERB++VB" }
3       down    _       _       _       _       2       compound:prt    _       attribute { name: "fPOS" value: "ADP++RP" }
4       the     _       _       _       _       5       det     _       attribute { name: "Definite" value: "Def" } attribute { name: "PronType" value: "Art" } attribute { name: "fPOS" value: "DET++DT" }
5       street  _       _       _       _       2       obj     _       attribute { name: "Number" value: "Sing" } attribute { name: "fPOS" value: "NOUN++NN" }
6       in      _       _       _       _       8       case    _       attribute { name: "fPOS" value: "ADP++IN" }
7       her     _       _       _       _       8       nmod:poss       _       attribute { name: "Gender" value: "Fem" } attribute { name: "Number" value: "Sing" } attribute { name: "Person" value: "3" } attribute { name: "Poss" value: "Yes" } attribute { name: "PronType" value: "Prs" } attribute { name: "fPOS" value: "PRON++PRP$" }
8       car     _       _       _       _       5       nmod    _       attribute { name: "Number" value: "Sing" } attribute { name: "fPOS" value: "NOUN++NN" }

The parsing result of SyntaxNet is:

1       Alice   _       NOUN    NNP     _       2       nsubj   _       _
2       drove   _       VERB    VBD     _       0       ROOT    _       _
3       down    _       ADP     IN      _       2       prep    _       _
4       the     _       DET     DT      _       5       det     _       _
5       street  _       NOUN    NN      _       3       pobj    _       _
6       in      _       ADP     IN      _       2       prep    _       _
7       her     _       PRON    PRP$    _       8       poss    _       _
8       car     _       NOUN    NN      _       6       pobj    _       _
bogatyy commented 7 years ago

Could you clarify the language and the dataset that you evaluated POS tagging on? Or do you mean tagging is worse on that specific sentence?

banyh commented 7 years ago

Here is the information:

bogatyy commented 7 years ago

Oh, that is because parsey_mcparseface was trained on a significantly larger dataset. Further, it was optimized to maximize both POS tagging accuracy and parsing accuracy. On the other hand, our baseline models for CoNLL 2017 (including the English model you looked at) could only be trained on UD data and were optimized only for parsing accuracy. To learn more: http://universaldependencies.org/conll17/evaluation.html