Closed haibow closed 10 years ago
Continue:
In our meeting on Friday, I mentioned we might apply a threshold to the candidate sentence relevance values, since only taking the top min(K_CANDIDATES, size(candSent)) might not be sufficient, in which case sentences with very low relevance scores while still ranking high are still forced to be considered.
I did some simple stats for that: Average (used) sentence relevance scores when K=3 Answered correctly: 1.9707, 1.7988, 0.1892, 0.3935 Not answered correctly: 2.0793, 1.0226, 0.5611, 0.6638
Interestingly, correctly answered questions even has a lower average candSent relevance scores.
So, relevance score in itself might not be reliable enough, and we might need to reconsider how to decide in which sentence the answer could be.
Richer representation, like coreference(StanfordNLP) or synonym(WordNet) could be a solution for that.
That makes sense, we are in need of those additional features which I believe the TAs have taken out of the baseline that they gave us.
Could you briefly explain where you found those 4 scores? I couldn't find them in the console readout
Hey guys,
Hows it going? Let me know.
thanks, ~abhi
On Mon, Nov 11, 2013 at 11:50 AM, yueranyuan notifications@github.comwrote:
That makes sense, we are in need of those additional features which I believe the TAs have taken out of the baseline that they gave us.
Could you briefly explain where you found those 4 scores? I couldn't find them in the console readout
— Reply to this email directly or view it on GitHubhttps://github.com/yueranyuan/hw5-team12/issues/2#issuecomment-28216828 .
As I understand, we've mostly been reading code, inspecting annotation output, and trying basic alterations. So not much concrete stuff yet. Check the wiki for a list of things we plan to do.
@yueranyuan search for "c@1" in console output :) there's a score after evaluating each document.
c@1 for K=5: 0.33, 0.11, 0.11, 0.3 (baseline) c@1 for K=4: 0.33, 0.22, 0.11, 0.3 c@1 for K=3: 0.44, 0.22, 0.11, 0.3 c@1 for K=2: 0.44, 0.00, 0.11, 0.4 c@1 for K=1: 0.22, 0.22, 0.22, 0.3
A good K could have marginal improvement