In the code generation view, currently the output code and expected code are displayed.
However, in most code generation datasets, such as HumanEval or Odex, evaluation is performed based on running the code and generating the output and comparing whether the output is correct. Based on this:
At the very least, it should be possible to view whether the generated code was judged as correct by showing a "correct/incorrect" label.
Even better would be the functionality to view:
Expected code
Predicted code
Output of expected code
Output of predicted code
Correctness/incorrectness value
This probably requires the data structure for code output_column and data_column to not be str, but a different data structure that includes the code, output, and correctness value.
In the code generation view, currently the output code and expected code are displayed.
However, in most code generation datasets, such as HumanEval or Odex, evaluation is performed based on running the code and generating the output and comparing whether the output is correct. Based on this:
Even better would be the functionality to view:
This probably requires the data structure for code
output_column
anddata_column
to not bestr
, but a different data structure that includes the code, output, and correctness value.