logic-star-ai / swt-bench

[NeurIPS 2024] Evaluation harness for SWT-Bench, a benchmark for evaluating LLM repository-level test-generation
https://openreview.net/forum?id=9Y8zUO11EQ&noteId=9Y8zUO11EQ
MIT License
16 stars 2 forks source link

Results on Verified #9

Open zyone1991 opened 1 week ago

zyone1991 commented 1 week ago

Describe the issue

Hi,

Are you planning to release the evaluation results over SWTbench-Verified? I think adding this would help a lot, given that the verified dataset is more reliable.

Suggest an improvement to documentation

No response