percyliang / sempre

Semantic Parser with Execution
Other
829 stars 299 forks source link

possible improvement of accuracy in sempre1.0 #82

Open uwittygit opened 8 years ago

uwittygit commented 8 years ago

Hi, Liang I think there is a bug in sempre1.0. you may try to fix it and see a significant accuracy improvement.

Location: src/edu/stanford/nlp/sempre/paraphrase/VectorSpaceModel.java in function computeSimilarity, line 133.

where you compute the similarity of two sentence by dot_product their sentences's vectors. but the sentence's vector is simply the mean of all word's vector. (see function computeUtteranceVec() ) your algorithm: sentence's vector = mean(sum(word's vector))

I think you forgot to normalize the sentence vector which should meet the condition: || sentence's vector|| === 1

sentence's similarity = sqrt {sentence1 * sentence2 / ||sentence1* sentence2 || } where ||vector of sentence1 || ==1 ||vector of sentence1 || ==1

public void computeSimilarity(ParaphraseExample ex, Params params) {
ex.ensureAnnotated();
//get source and target representations
double[] sourceVec,targetVec;
synchronized (phraseVectorCache) {
  sourceVec = phraseVectorCache.containsKey(ex.source) ? phraseVectorCache.get(ex.source) : computeUtteranceVec(ex.sourceInfo);
  targetVec = phraseVectorCache.containsKey(ex.target) ? phraseVectorCache.get(ex.target) : computeUtteranceVec(ex.targetInfo);
  MapUtils.putIfAbsent(phraseVectorCache, ex.source, sourceVec);
  MapUtils.putIfAbsent(phraseVectorCache, ex.target, targetVec);
}
//combine them
FeatureVector fv;
if(vsmSimilarityFunc==SimilarityFunc.DIAGNONAL)
  fv = getDiagonalMatrixFeatures(sourceVec,targetVec);
else if(vsmSimilarityFunc==SimilarityFunc.FULL_MATRIX)
  fv = getFullMatrixFeatures(sourceVec,targetVec);
else //dot product
  fv = getDotProductFeature(sourceVec,targetVec); /// not a good similarity  here!!!
//set stuff
ex.setVectorSpaceSimilarity(new FeatureSimilarity(fv,ex.source,ex.target,params));

}

RobinCai1993 commented 7 years ago

Do you know where can I download the parasempre file?