Open zachblume opened 4 months ago
Adjust benchmark to be a string diff of execution->% correct API calls (I assume we want to award 0 points for the wrong overall pass/fail Boolean answer though)
Adjust benchmark to be a string diff of execution->% correct API calls (I assume we want to award 0 points for the wrong overall pass/fail Boolean answer though)