Results on Verified - Githubissues

logic-star-ai / swt-bench

[NeurIPS 2024] Evaluation harness for SWT-Bench, a benchmark for evaluating LLM repository-level test-generation

https://openreview.net/forum?id=9Y8zUO11EQ&noteId=9Y8zUO11EQ

MIT License

16 stars 2 forks source link

Results on Verified #9

Open zyone1991 opened 1 week ago

zyone1991 commented 1 week ago

Describe the issue

Hi,

Are you planning to release the evaluation results over SWTbench-Verified? I think adding this would help a lot, given that the verified dataset is more reliable.

Suggest an improvement to documentation

No response