Closed jfc43 closed 1 year ago
Dear Jiefeng,
You are indeed correct that we are using bi-directional non-contradiction as our criterion for semantic equivalence in the code. I have made a few changes to the public release of the code before and after its publication — I will re-run our experiments to verify our results and see whether these updates introduced any bugs. I am also currently working on a simplified implementation of semantic entropy, and I hope to provide the updated code soon. This should make it easier to use in future experiments.
Best, Lorenz
Hi Lorenz,
any news on this issue? You closed it, but for me, the issue of reproducibility still persists.
Best, Sebastian
I can roughly reproduce the results for the normalized predictive entropy baseline. However, I fail to reproduce the results for the semantic entropy method and the results I got for the semantic entropy method were slightly worse than the normalized predictive entropy baseline. I find that the implementation of judging whether two answers are equivalent is different from what is described in the paper. In the paper, the description is: "The Deberta model then classifies this sequence into one of: entailment, neutral, contradiction. We compute both directions, and the algorithm returns equivalent if and only if both directions were entailment." However, in the code, the implementation is: https://github.com/lorenzkuhn/semantic_uncertainty/blob/main/code/get_semantic_similarities.py#L109-L114. It seems the implemented condition is that if both directions are not contradiction. Please check if this is correct or not. Thanks!