Closed Paulmzr closed 5 months ago
Hi, thanks a lot :)
Yes, unfortunately the SimulEval tool changes very often and with many breaking changes and I found it difficult to compare results between different versions (and, sometimes, between different commits of the same version). For example, between version 1.0 and 1.1, the tool complexity changed and the agents had to be refactored to make them applicable to this new version. Therefore, I suggest to explicitly add the version and, if possible also the commit, of SimulEval to your work and use the version reported in this repo if you are interested in replicating the results as I did for EdAtt (version) and for the latest work, AlignAtt (version and commit).
In summary, it is impossible to compare the results between the two versions but this does not depend on the specific agent but on the SimulEval tool.
Hope that I helped!
Hi, thanks a lot :)
Yes, unfortunately the SimulEval tool changes very often and with many breaking changes and I found it difficult to compare results between different versions (and, sometimes, between different commits of the same version). For example, between version 1.0 and 1.1, the tool complexity changed and the agents had to be refactored to make them applicable to this new version. Therefore, I suggest to explicitly add the version and, if possible also the commit, of SimulEval to your work and use the version reported in this repo if you are interested in replicating the results as I did for EdAtt (version) and for the latest work, AlignAtt (version and commit).
In summary, it is impossible to compare the results between the two versions but this does not depend on the specific agent but on the SimulEval tool.
Hope that I helped!
Thanks for your kind suggestions :)
I still have a bit of confusion. Which version, v1.0.2 or v1.1.4, do you think would be more reasonable to reproduce EdAtt results and compare with our work. In this EdAtt repo, v1.0.2 is recommended, but from the results above, it seems that the results reproduced using v1.1.4 version are better.
I still have a bit of confusion. Which version, v1.0.2 or v1.1.4, do you think would be more reasonable to reproduce EdAtt results and compare with our work. In this EdAtt repo, v1.0.2 is recommended, but from the results above, it seems that the results reproduced using v1.1.4 version are better.
I would say the same version that you are using for evaluating your policy. If you are using version 1.1, then the v1.1 results that you have obtained for EDAtt are the right ones.
Got it. Thanks for your reply.
I still have a bit of confusion. Which version, v1.0.2 or v1.1.4, do you think would be more reasonable to reproduce EdAtt results and compare with our work. In this EdAtt repo, v1.0.2 is recommended, but from the results above, it seems that the results reproduced using v1.1.4 version are better.
I would say the same version that you are using for evaluating your policy. If you are using version 1.1, then the v1.1 results that you have obtained for EDAtt are the right ones.
Hi, thanks for your great work!
When I try to reproduce the results of edatt, I find the results are inconsistent between using SimulEval v1.0.2 and v1.1.4.
I use the checkpoints provided by BugConformer for must en-de and the global cmvn file from edatt repo.