ls1intum / Artemis

Artemis - Interactive Learning with Automated Feedback
https://docs.artemis.cit.tum.de
MIT License
506 stars 292 forks source link

Text Exercise Assessment: Warn/Hint to Corrector When Structured Assessment Instructions Are Used More Often Than Limit #7355

Open just-max opened 1 year ago

just-max commented 1 year ago

Is your feature request related to a problem?

When marking text exercises, instructors can add structured assessment instructions. Structured assessment instructions can set a limit. When marking the exercise, the structured assessment can be used any number of times, but stops giving points once applied more often than the limit.

The problem is that it is not clear to the corrector that the instruction has been used too often. Either they end of giving too few points on the exercise, or they try and add up the points they have assigned and can't figure out why they don't add up to the total.

Describe the solution you'd like

In the UI, the corrector should be shown a warning when a structured assessment is used more often than its limit.

My suggestion is to warn inline when an instruction has been used too often. This suggestion should appear on all uses of the structured assessment instructions, not just those after the limit, because the limit applies to the whole "group" and not just those "after the limit". Additionally, a warning could appear next to the final score that some assessment instructions were not counted to the total score.

Structured assessment instructions should still be usable more often than the limit, since there are legitimate uses for that (e.g. a deduction that may be applied at most 3 times). The warning should also be clear that the corrector is free to use the structured assessment more often than the limit, but it will only be counted up to the limit (i.e. the warning should not be too harsh, and more informational).

Describe alternatives you've considered

None.

Additional context

None.

Strohgelaender commented 1 year ago

In the student view, subsequent feedback is shown with a grey background and a warning symbol (see PR #4500): grafik

Imo a similar gray background should be used during the assessment view.

just-max commented 1 year ago

My concern with the grey background is that it's easy to overlook when next to other feedback items that give 0 points. But together with the warning icon I think it's visible enough.

krusche commented 1 year ago

Is that actually really an issue? I think a reviewer should not really care whether the application of a grading criteria gives points or not. If it is really the same mistake and the instructor decides to define a limit, a reviewer should not use a different criteria for the same "subsequent" mistake, just to add or subtract points.

This would actually mean the limit is kind of useless. Theoretically, we could even argue that a reviewer does not need to know the limit nor how many points an assessment purely based on structured grading criteria would lead to.

In addition, it could theoretically be the case that the instructor changes a limit or the points of one specific grading criteria, then it would be problematic if another one was used.

If at all, a reviewer should discuss with the instructor whether the grading criteria are appropriate or not

just-max commented 1 year ago

Is that actually really an issue?

This is not a theoretical gripe, it's an issue that has bitten our team several times during exam correction. We have discussed it before in the past but yesterday a few of our reviewers suggested that this was enough of a frustration that it was worth opening an issue.

A few examples of where this has happened:

In all cases, it would be a huge help for reviewers to receive a nudge that they might have made a mistake.

Theoretically, we could even argue that a reviewer does not need to know the limit nor how many points an assessment purely based on structured grading criteria would lead to.

Our correctors are competent and already have access to the full grading scheme. Very few submissions are so simple that they can be graded blindly like this, so our tutors use the structured assessments as a starting point together with their own judgement to mark an exercise.

Also the reviewer can just count the marks they have given and compare it to the total, and wonder why they don't match. That's what's frustrating about this issue in the first place.

In addition, it could theoretically be the case that the instructor changes a limit or the points of one specific grading criteria, then it would be problematic if another one was used.

Once again, there are so many varied submissions that in practice this will certainly just end up someone's well thought through correction, at least for the kind of exercises we are grading.

just-max commented 1 year ago

The "obvious" solution is to add a flag whether limits on the assessment instructions of an exercise should be "blind" or whether they should warn. In the former case that would require hiding the point total as mentioned. But for us this blind style of correction would not be not useful.

krusche commented 1 year ago

So the main argument is that the limit is used intentionally, but competent reviewers are confused about its usage if they cannot explicitly recognize it in the user interface because they might have done a mistake (e.g. choose the wrong option in a drop-down menu). Did I understand you correctly?

just-max commented 1 year ago

Yep, that's right 👍

I hadn't considered the other use case before.

krusche commented 1 year ago

Would you argue that preventing the mistake when choosing the grading instruction would be possible by somehow improving the user interface?

Because the mistake could happen in scenarios where the limit is not hit or not used at all and would still lead to a non optimal assessment...

Any ideas?

just-max commented 1 year ago

Do you mean in the sense of placing more "automatic" restrictions on grading criteria, that are checked by the system? I think the only further thing I can think of in that direction is to have a criterion be a sub-criterion of another, i.e. you're only allowed to use the sub-criterion if you use the main criterion. This would be used e.g. for deductions that are only meant to be used once the main points have been given ("almost correct, but... (2P)" and "...a small mistake here (-0.5)").

Otherwise I'm not sure how much can be checked automatically. The current limits are not a catch-all, but they prevent a number of silly mistakes.