zeno-ml / zeno

AI Data Management & Evaluation Platform
https://zenoml.com
MIT License
214 stars 10 forks source link

Display: For code generation view, add "correct/incorrect" labels (and potentially execution outputs) #813

Open neubig opened 1 year ago

neubig commented 1 year ago

In the code generation view, currently the output code and expected code are displayed.

However, in most code generation datasets, such as HumanEval or Odex, evaluation is performed based on running the code and generating the output and comparing whether the output is correct. Based on this:

  1. At the very least, it should be possible to view whether the generated code was judged as correct by showing a "correct/incorrect" label.
  2. Even better would be the functionality to view:

    1. Expected code
    2. Predicted code
    3. Output of expected code
    4. Output of predicted code
    5. Correctness/incorrectness value

    This probably requires the data structure for code output_column and data_column to not be str, but a different data structure that includes the code, output, and correctness value.

neubig commented 1 year ago

Here is an example of what the outputs look like now:

Screen Shot 2023-06-21 at 1 00 53 PM

It would be nice to have the correctness/incorrectness value and error message also able to be displayed as well.