juditacs / semeval

MathLing Budapest Team's repo
MIT License
10 stars 9 forks source link

HunPos results depend on previous sentence #18

Closed juditacs closed 9 years ago

juditacs commented 9 years ago
cat semeval_data/sts_trial/sts-en-test-gs-2014/STS.input.headlines.txt | head -292 | tail -1 | python src/align_and_penalize.py lsa 2>err1
0.629661449331
cat semeval_data/sts_trial/sts-en-test-gs-2014/STS.input.headlines.txt | head -292 | python src/align_and_penalize.py lsa 2>err2 | tail -1
0.652330724666

According to the logs, HunPos tags the words differently if the input is 292 lines:

$ tail -10 err1
2014-11-17 17:22:03,928 : align_and_penalize (324) - INFO - penalty for (u'end', 'NN'): wf: 0.102864932964, wp: 1
2014-11-17 17:22:03,928 : align_and_penalize (293) - INFO - preferred pos: NN
2014-11-17 17:22:03,928 : align_and_penalize (293) - INFO - preferred pos: NN
2014-11-17 17:22:03,928 : align_and_penalize (324) - INFO - penalty for (u'season', 'NN'): wf: 0.130057389931, wp: 1
2014-11-17 17:22:03,928 : align_and_penalize (293) - INFO - preferred pos: NNP
2014-11-17 17:22:03,928 : align_and_penalize (293) - INFO - preferred pos: NNP
2014-11-17 17:22:03,928 : align_and_penalize (333) - INFO - penalty for (u'Football', 'NNP'): wf: 0.176370547033, wp: 1
2014-11-17 17:22:03,928 : align_and_penalize (345) - INFO - P1A: 0.0232922322896 P2A: 0.0220463183792 P1B: 0.0 P2B: 0.0
2014-11-17 17:22:03,928 : align_and_penalize (283) - INFO - T=0.675, P=0.0453385506687
2014-11-17 17:22:03,929 : align_and_penalize (812) - WARNING - 0...

When the input is only the 292th line:

$ tail -10 err2
2014-11-17 17:26:36,237 : align_and_penalize (295) - INFO - not preferred pos: None
2014-11-17 17:26:36,237 : align_and_penalize (324) - INFO - penalty for (u'end', None): wf: 0.102864932964, wp: 0.5
2014-11-17 17:26:36,237 : align_and_penalize (295) - INFO - not preferred pos: JJ
2014-11-17 17:26:36,237 : align_and_penalize (295) - INFO - not preferred pos: JJ
2014-11-17 17:26:36,237 : align_and_penalize (324) - INFO - penalty for (u'season', 'JJ'): wf: 0.130057389931, wp: 0.5
2014-11-17 17:26:36,238 : align_and_penalize (295) - INFO - not preferred pos: None
2014-11-17 17:26:36,238 : align_and_penalize (295) - INFO - not preferred pos: None
2014-11-17 17:26:36,238 : align_and_penalize (333) - INFO - penalty for (u'Football', None): wf: 0.176370547033, wp: 0.5
2014-11-17 17:26:36,238 : align_and_penalize (345) - INFO - P1A: 0.0116461161448 P2A: 0.0110231591896 P1B: 0.0 P2B: 0.0
2014-11-17 17:26:36,238 : align_and_penalize (283) - INFO - T=0.675, P=0.0226692753344

I copied the logs in question to /home/judit/projects/semeval/log/hunpos_issue

recski commented 9 years ago

This is a direct result of #7, when hunpos fails, the lines read and written to and from the hunmorph binary become misaligned. How to handle #7 is a different issue, I'm closing this one.