nvtransfer / RULER

This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?
Apache License 2.0
646 stars 43 forks source link

how do you take care of the presence of 'and' in the output in the evaluation #23

Closed vkaul11 closed 3 months ago

vkaul11 commented 4 months ago

Sometimes the output also can emit "and" for say the multi-value case. Should we not account for that in the output and evaluate the string match accordingly or you never faced such an issue?

3728882, 7210606, 7120868, and 8606962

was my output and my outputs are ['8606962', '7120868', '3728882', '7210606']

hsiehjackson commented 4 months ago

That is fine because our metric checks whether the answers exist in the prediction string.