jupyter / nbgrader

A system for assigning and grading notebooks
https://nbgrader.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
1.3k stars 317 forks source link

Set test cell score tag using variable in test cell #1889

Open Adamtaranto opened 5 months ago

Adamtaranto commented 5 months ago

I want to modify the autograder code to check for a specific variable "total_score" in a test cell and use its value to set the score metadata tag for that test cell.

I am running lots of try/except blocks in my test cell and tallying the total score at the end. If any single test fails I have to manually check the tally for the cell and fill in the score for that cell. I want to automate this process.

Can someone please point me to the autograder code that:

Also, what are the tags that correspond to:

Thanks!

Adamtaranto commented 5 months ago

Do you think this would be a generally useful feature? @brichet @jhamrick

I can develop it myself, just need some help understanding the tags and the existing autograder code.

Adamtaranto commented 4 months ago

Is anyone available to help with this?

shreve commented 4 months ago

This already exists. If the test cell outputs text/plain that can be parsed as a float, that will be used as the score for the cell.

https://github.com/jupyter/nbgrader/blob/e0b2e65a56d483560f81a02c8a0b323f4659b10d/nbgrader/utils.py#L76-L81 https://github.com/jupyter/nbgrader/blob/e0b2e65a56d483560f81a02c8a0b323f4659b10d/nbgrader/utils.py#L142-L155 https://github.com/jupyter/nbgrader/blob/e0b2e65a56d483560f81a02c8a0b323f4659b10d/nbgrader/preprocessors/saveautogrades.py#L45

brichet commented 4 months ago

Thanks @shreve for providing a possible answer.

@Adamtaranto I thought I already asked (sorry, look like I never pressed the button), but I was wondering if you could provide a example of your use case. Actually, if @shreve's answer fixes your issue it's not necessary.

Adamtaranto commented 4 months ago

@shreve's answer gets me part of the way to a solution, thanks.

I will need a way to tag the value I want used as the partial score in cases where there is a lot of other text output.

Here is an example of how I am running multiple tests in a autograder test cell. Most are within try/except blocks so that a failed test doesn't stop subsequent tests from running. I'm outputting comments for the feedback reports to help students figure out why they lost marks.

At the end I print the total marks and run an assert statement to check if total == max possible marks.

If the final assert fails then the grader has to check the total marks value and enter it as a manual grade.


# Test student function "translate_orfs()"

### --- Test case 1: Seq < 3 bases
# return empty list if < 3 bases
try:
    test_1_output = "NO_OUTPUT" # Default value in case function dies with error
    test_1b_output = "NO_OUTPUT"
    test_1_output = translate_orfs(SeqRecord(Seq(''), 'empty_seq_object',), 2, 1)
    test_1b_output = translate_orfs(SeqRecord(Seq('AA'), 'empty_seq_object',), 2, 1)
    # If function runs, assess for correct output
    assert test_1_output == [] and test_1b_output == []
    # If output is correct assign marks for test case
    test_case_1 = 1
except:
    # If function fails or gives incorrect result set test case marks to 0
    test_case_1  = 0
    # Error summary
    print(f"Failed test 1: Empty Seq has len < 3 , student output was {test_1_output} and {test_1b_output}, not empty list.")

### --- Test case 2: Test for no ORFs
# Should return an empty list
try:
    test_2_output = "NO_OUTPUT" # Default value in case function dies with error
    test_2_output = translate_orfs(SeqRecord(Seq('TTTTTTTTTT'), 'no_orfs'), 2, 1)
    # If function runs, assess for correct output
    assert test_2_output == []
    # If output is correct assign marks for test case
    test_case_2 = 1
except:
    # If function fails or gives incorrect result set test case marks to 0
    test_case_2  = 0
    # Error summary
    print(f"Failed test 2: Test for no ORFs, student score was {test_2_output}, not empty list.")

### --- Test case 3: No input
# Return None if no input
try:
    test_3_output = "NO_OUTPUT" # Default value in case function dies with error
    test_3_output = translate_orfs('', 2, 1)
    # If function runs, assess for correct output
    assert test_3_output == None
    # If output is correct assign marks for test case
    test_case_3 = 1
except:
    # If function fails or gives incorrect result set test case marks to 0
    test_case_3  = 0
    # Error summary
    print(f"Failed test 3: Return None if no input seq, student score was {test_3_output}, not None.")

#############

# Tally score for quick visual check.
# Use this to fill in the final score for the test cell.
print(f"Total score: {test_case_1 + test_case_2 + test_case_3}")

# Final assert causes the test cell to log 0 marks if any test fails
assert test_case_1 + test_case_2 + test_case_3 == 3, 'Some tests failed.'

I was thinking of having a decorator function in nbgrader that checks for the value of a specific variable to set the partial grade:

@set_metadata_score(set_from_var='score')
def evaluate_student_answer():
    student_answer = 42
    correct_answer = 42

    # Evaluate the student's answer
    if student_answer == correct_answer:
        score = 10  # Full marks
    else:
        score = 0  # No marks

    return score

# Execute the function to set the metadata
evaluate_student_answer()

or maybe have the decorator function set a partial grade meta tag for the cell and then have saveautogrades.py check for the tag later.

shreve commented 4 months ago

A couple things here.

I need to clarify, the existing partial is when both output_type == "execute_result" and mimetype == "text/plain". This is specifically the case for implicit returns at the end of a cell. Nothing print()ed will be considered for the score. In the following notebook example, 1 would be parsed as the score:

# Input cell
[1]: print("0")
     1
     0
# Execute result
[1]: 1

So your simple example would actually work without the decorator:

# Input cell
[1]:  def evaluate_student_answer():
          student_answer = 42
          correct_answer = 42

          print("Noisy computing")

          # Evaluate the student's answer
          if student_answer == correct_answer:
              score = 10  # Full marks
          else:
              score = 0  # No marks

          return score

      evaluate_student_answer()
      Noisy computing
# Execute result
[1]: 10

However, there is a path forward if you're set on the metadata option.

Nbgrader operates at the Jupyter client level, not inside the kernel, so it doesn't know anything about your variables inside the notebook. It sends the notebook to the kernel and determines a grade based on the output. When you're grading student code inside the kernel, you can send custom metadata back to the client via the display_pub, which is meant to display dynamic responses in the Jupyter notebook context.

For example, if your grading code looked like this:

from IPython import InteractiveShell

def report_score(score):
    InteractiveShell.instance().display_pub.publish(
        # This first data parameter could be an empty dict if you don't want to display anything
        {
            "text/plain": f"Your score is {score}",
            "text/html": f"Your score is <b>{score}</b>"
        },
        metadata={"score": score}
    )

def evaluate_student_answer():
    student_answer = 42
    correct_answer = 42

    # Evaluate the student's answer
    if student_answer == correct_answer:
        score = 10  # Full marks
    else:
        score = 0  # No marks

    report_score(score)

# Execute the function to set the metadata
evaluate_student_answer()

After the ExecutePreprocessor runs, the student notebook will have a cell with some outputs that look like the following:

  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "893ff932-3c10-46ce-89d9-8c390f8b1aab",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "Your score is <b>10</b>"
      ],
      "text/plain": [
       "Your score is 10"
      ]
     },
     "metadata": {
      "score": 10
     },
     "output_type": "display_data"
    }
   ],
   "source": [...]
}

The determine_grade function is already looking at cell outputs. You could update it to look for that metadata.score or something similar. To be clear, this is basically what is already happening with the execute_result.

This is really rough and would need some shaping before pushing upstream into nbgrader, but this is enough to build a proof of concept for yourself.