iMeanAI / WebCanvas

Connect agents to live web environments evaluation.
https://www.imean.ai/web-canvas
MIT License
197 stars 11 forks source link

MACRO or MICRO Completion Rate? #27

Closed boyugou closed 1 month ago

boyugou commented 1 month ago

Although it wasn't made clear in the paper, it seems to me that the scores presented are MICRO, not MACRO, correct?

However, wouldn't it make more sense to use MACRO scores?

boyugou commented 1 month ago

Should follow-up work follow the MICRO calculation for Completion Rate? I think most of benchmarks use MACRO scores, rather than MICRO scores

han032206 commented 1 month ago

This issue is related to the one mentioned in #18. Please refer to the discussion in that issue.