Closed juditacs closed 9 years ago
cat semeval_data/sts_trial/sts-en-test-gs-2014/STS.input.headlines.txt | head -292 | tail -1 | python src/align_and_penalize.py lsa 2>err1 0.629661449331
cat semeval_data/sts_trial/sts-en-test-gs-2014/STS.input.headlines.txt | head -292 | python src/align_and_penalize.py lsa 2>err2 | tail -1 0.652330724666
According to the logs, HunPos tags the words differently if the input is 292 lines:
$ tail -10 err1 2014-11-17 17:22:03,928 : align_and_penalize (324) - INFO - penalty for (u'end', 'NN'): wf: 0.102864932964, wp: 1 2014-11-17 17:22:03,928 : align_and_penalize (293) - INFO - preferred pos: NN 2014-11-17 17:22:03,928 : align_and_penalize (293) - INFO - preferred pos: NN 2014-11-17 17:22:03,928 : align_and_penalize (324) - INFO - penalty for (u'season', 'NN'): wf: 0.130057389931, wp: 1 2014-11-17 17:22:03,928 : align_and_penalize (293) - INFO - preferred pos: NNP 2014-11-17 17:22:03,928 : align_and_penalize (293) - INFO - preferred pos: NNP 2014-11-17 17:22:03,928 : align_and_penalize (333) - INFO - penalty for (u'Football', 'NNP'): wf: 0.176370547033, wp: 1 2014-11-17 17:22:03,928 : align_and_penalize (345) - INFO - P1A: 0.0232922322896 P2A: 0.0220463183792 P1B: 0.0 P2B: 0.0 2014-11-17 17:22:03,928 : align_and_penalize (283) - INFO - T=0.675, P=0.0453385506687 2014-11-17 17:22:03,929 : align_and_penalize (812) - WARNING - 0...
When the input is only the 292th line:
$ tail -10 err2 2014-11-17 17:26:36,237 : align_and_penalize (295) - INFO - not preferred pos: None 2014-11-17 17:26:36,237 : align_and_penalize (324) - INFO - penalty for (u'end', None): wf: 0.102864932964, wp: 0.5 2014-11-17 17:26:36,237 : align_and_penalize (295) - INFO - not preferred pos: JJ 2014-11-17 17:26:36,237 : align_and_penalize (295) - INFO - not preferred pos: JJ 2014-11-17 17:26:36,237 : align_and_penalize (324) - INFO - penalty for (u'season', 'JJ'): wf: 0.130057389931, wp: 0.5 2014-11-17 17:26:36,238 : align_and_penalize (295) - INFO - not preferred pos: None 2014-11-17 17:26:36,238 : align_and_penalize (295) - INFO - not preferred pos: None 2014-11-17 17:26:36,238 : align_and_penalize (333) - INFO - penalty for (u'Football', None): wf: 0.176370547033, wp: 0.5 2014-11-17 17:26:36,238 : align_and_penalize (345) - INFO - P1A: 0.0116461161448 P2A: 0.0110231591896 P1B: 0.0 P2B: 0.0 2014-11-17 17:26:36,238 : align_and_penalize (283) - INFO - T=0.675, P=0.0226692753344
I copied the logs in question to /home/judit/projects/semeval/log/hunpos_issue
This is a direct result of #7, when hunpos fails, the lines read and written to and from the hunmorph binary become misaligned. How to handle #7 is a different issue, I'm closing this one.
According to the logs, HunPos tags the words differently if the input is 292 lines:
When the input is only the 292th line:
I copied the logs in question to /home/judit/projects/semeval/log/hunpos_issue