can not reproduce results in the paper

0-KaiKai-0 commented 2 weeks ago

I run your instructions on the openbookqa task and got the following results: full cache / dense: "openbookqa": { "acc": 0.414, "acc_stderr": 0.02204949796982787, "acc_norm": 0.458, "acc_norm_stderr": 0.022303966774269938 }

streamingllm: "openbookqa": { "acc": 0.256, "acc_stderr": 0.019536923574747588, "acc_norm": 0.342, "acc_norm_stderr": 0.02123614719989926 }

h2o: "openbookqa": { "acc": 0.264, "acc_stderr": 0.01973288558592208, "acc_norm": 0.348, "acc_norm_stderr": 0.0213237286328075 }

cam: "openbookqa": { "acc": 0.31, "acc_stderr": 0.020704041021724795, "acc_norm": 0.352, "acc_norm_stderr": 0.021380042385946055 }

I think it might not be problems of experiment environment. I run the official repo of H2O and got almost the same scores of 5-shot evaluation as their paper.

duyuxuan1486 commented 2 weeks ago

What ratio did you set? In openbookqa dataset, it provides 4 options for the model to choose. That means even without cache, the base acc is 25%.

0-KaiKai-0 commented 2 weeks ago

Both the start-ratio and recent-ratio are 0.1. And in the 0-shot setting.

zyxxmu / cam

can not reproduce results in the paper #2