nuprl / MultiPL-E

A multi-programming language benchmark for LLMs
https://nuprl.github.io/MultiPL-E/
Other
201 stars 38 forks source link

Add flag to skip failing tests #100

Closed cassanof closed 1 year ago

cassanof commented 1 year ago

For MultiPL-T, we skipped failing tests instead of discarding the whole problem. We should not do this for benchmarks for obvious reasons. While merging dev into main, we forgot to differentiate between the two. So I added a flag to enable skipping failing tests. By default, if a test fails translation, the whole problem is discarded (like it was before MultiPL-T changes).