openwebwork / pg

Problem rendering engine for WeBWorK
http://webwork.maa.org/wiki/Category:Authors
Other
46 stars 76 forks source link

Discussion: Should enableReducedScoring work for graders other than avg and std? #648

Open somiaj opened 2 years ago

somiaj commented 2 years ago

I recently discovered that not all problems were giving reduced scores after the Reduced Scoring date. Here is the grades on an assignment completed during the reduced scoring period. All of the problems with 100% use the weightedGrader, while all the ones with 75% use the standard grader.

myscrot

After seeing this in multiple problems the only thing common between them was the weightedGrader, which seems to be the most reasonable culprit.

somiaj commented 2 years ago

Looking a little deeper it looks like this is by design and has to be enabled in each custom grader manually. From the ReducedScoring description:

This works with the avg_problem_grader (which is the the default grader) and the std_problem_grader
(the all or nothing grader). It will work with custom graders if they are written appropriately.

This issue is now a discussion on if ReducedScoring should work for all graders or not. I think having reduced scoring work for all graders by default would be more appropriate (to me once I enabled it, I was surprised to find it only worked some times -- though my fault for not fully understanding the description given above).

I also see that certain problem authors may want to bypass the reduced scoring grader, so configuration variable to skip the reduced scoring effect for that problem could be added to allow bypassing the reduced scoring penalty.

dlglin commented 2 years ago

From a design perspective I think having pg code be able to override settings done at the WeBWorK level is generally a bad idea. This leads to unexpected behaviour, like what @somiaj experienced: an instructor adds a problem to a set, but it doesn't respect the reduced credit settings. The only way to know that this is going to happen is to read the code of any assigned problem, assuming that the instructor even knows what to look for.

I personally don't see a scenario where it makes sense for some problems in a set to have a reduced credit period, while others don't. How would this information be communicated to the student? If an instructor wants different problems to have different reduced credit conditions, then they should put them in separate assignments. This will be much clearer to the students.

My vote is for reduced scoring to be implemented consistently for all graders (if possible).

Alex-Jordan commented 2 years ago

The way reduced credit is implemented has this problem, and at least two other problems which I can describe. Maybe all of this can be thought about and there could eventually be a redesign of reduced credit.

Suppose a set has a reduced credit date, and a close date a few days later. Student X does some exercises after the reduced scoring date, but let's say not all of them. It turns out student X has been dealing with something in their personal life and the instructor says, OK, you have an extra week for this assignment. The instructor can change the reduced scoring date and the close date for this one student. However for the exercises they already completed, the database still has their reduced scores. Something more needs to be done (either by the student or the instructor) for those scores to return to 100%.

Here is another thing. The screenshot below has the progress indicator for a student who has answered every question correctly. It's just that they did this in the reduced scoring period. Because they do not actually have 100%, the progress indicator does not have a checkmark, and suggests there is more for them to do. But it's bad to suggest this. They are in a state where there is nothing more they can do. They have the maximum score possible at this point in time

Screen Shot 2022-02-03 at 11 22 26 AM

Overall, my sense is that the database scores should be the actual assessed score according to the problem's grader. Each answer stored in the database could store a timestamp (maybe it already does). And for display purposes or overall set grade calculation purposes, a score reduction factor would be applied at that run time. It would be a function of the "actual" score, the timestamp, the reduced scoring date, and the reduced scoring factor. This would solve @somiaj 's reported issue immediately. It would solve the first issue I reported immediately. And for the second issue I posted, just making the progress indicator depend on the "actual" score instead of the reduced score would address that.

Alex-Jordan commented 2 years ago

One more issue I can imagine, but have not experienced. What if you use conditional release and need a student to get 90% on an assignment A to get into assignment B? What if they've entered the reduced credit period on A, and 90% isn't even possible anymore?

Shouldn't conditional release depend on the student's actual scores, not the reduced scores? But again, we are not storing the actual scores at this time, only the reduced scores.

somiaj commented 2 years ago

With creating a better split between PG and webwork, reduced scoring does seem something that should be on the webwork side. So a good start could be to remove reduced scoring from the PG graders, so PG will report the raw score. Then leave it to webwork to decide what to do with the score.

drgrice1 commented 2 years ago

It seems to me that this is the way to go. Reduced scoring is certainly something that should not be handled by PG at all. I may try to look into it. It doesn't seem like it would take to much effort. Although if someone else has time to do it, that would be great. I will post here if I start working on it to prevent doubling of effort.

pstaabp commented 2 years ago

I agree with the role of PG versus webwork in this respect. @drgrice1 Hold off a bit on this. I have started a branch to further split the PG-Webwork code (and will ask many questions today in our meeting), which I think we be beneficial before tackling this.

dlglin commented 2 years ago

The way reduced credit is implemented has this problem, and at least two other problems which I can describe. Maybe all of this can be thought about and there could eventually be a redesign of reduced credit.

We can improve some things in WW2, but the DB structure limits how far we can go. We should definitely think about this carefully before implementing in WW3.

Suppose a set has a reduced credit date, and a close date a few days later. Student X does some exercises after the reduced scoring date, but let's say not all of them. It turns out student X has been dealing with something in their personal life and the instructor says, OK, you have an extra week for this assignment. The instructor can change the reduced scoring date and the close date for this one student. However for the exercises they already completed, the database still has their reduced scores. Something more needs to be done (either by the student or the instructor) for those scores to return to 100%.

Retroactive changes are always going to be challenging. Right now changing the dates doesn't have any effect on scores. For example, say a student tried a problem after the due date, and then the instructor extends the due date, those attempts are not recorded anywhere, so they wouldn't be scored. Changing the reduced credit behaviour would be inconsistent with this.

It's worth noting that this is already inconsistent with other things. For example, changing a problem value immediately updates all of the scores, since this aspect is calculated at runtime.

Here is another thing. The screenshot below has the progress indicator for a student who has answered every question correctly. It's just that they did this in the reduced scoring period. Because they do not actually have 100%, the progress indicator does not have a checkmark, and suggests there is more for them to do. But it's bad to suggest this. They are in a state where there is nothing more they can do. They have the maximum score possible at this point in time Screen Shot 2022-02-03 at 11 22 26 AM

You're right that it's bad to suggest that there is more for them to do, but is it right to give them a checkmark when they've only received 75% credit? Maybe we need a different way to indicate that the student has not received full credit, but can't do any better?

Overall, my sense is that the database scores should be the actual assessed score according to the problem's grader. Each answer stored in the database could store a timestamp (maybe it already does). And for display purposes or overall set grade calculation purposes, a score reduction factor would be applied at that run time. It would be a function of the "actual" score, the timestamp, the reduced scoring date, and the reduced scoring factor. This would solve @somiaj 's reported issue immediately. It would solve the first issue I reported immediately. And for the second issue I posted, just making the progress indicator depend on the "actual" score instead of the reduced score would address that.

For one thing this would require saving the score and time for every attempt. Consider the following scenario: A student gets 80% on a problem before the reduced credit date. They then get 100% on the problem during a 75% reduced credit period. By moving the reduced scoring date the instructor could change which of these is the "best attempt". The only way I see of making sure that the logic is correct would be to iterate over all of the attempts every time the score was generated. This might be doable using the past answer table, but right now the past answer table isn't used for any grade calculations, so we'd have to make sure that the data in that table is reliable enough.

For WW3 I have a conception of a "regrade answers" feature, which runs all of the student's past answers through the renderer again, and rescores them. My initial motivation for this would be for correcting errors in problems (in cases where the randomization and the answer blanks don't change), but it could also be used for changing the reduced credit conditions.

Alex-Jordan commented 2 years ago

For example, say a student tried a problem after the due date, and then the instructor extends the due date, those attempts are not recorded anywhere, so they wouldn't be scored. Changing the reduced credit behaviour would be inconsistent with this.

I disagree that these are comparable situations. In one situation, the student used the Submit button, and in the other they used the Check button. Yes, the student may be oblivious that Check is not Submit, but in the situation where they answer after a close date, they weren't really "answering".

If we do start storing the full history of submitted answers (or at least their scores and timestamps) then storing Checked answers in the same way might not be a big deal. And it could alleviate some student frustration when they are unaware that the due date is passed and have "completed" a set using Check. At least then the instructor could see what they did.

Maybe we could reach a place where we no longer have a need for two buttons.

For example, changing a problem value immediately updates all of the scores, since this aspect is calculated at runtime.

By problem "value", do you mean the weight?

is it right to give them a checkmark when they've only received 75% credit?

I don't see a problem with that, personally. But what matters is to show completion or not. If it's complete, but has a reduced score applied, that's not relevant for what this panel should be telling the student (which is something like "you should go back to continue working on this or that problem.) So I would be fine with a check mark or something that indicates "this one is complete".

For one thing this would require saving the score and time for every attempt.

Point taken. Scenarios where students give answers both before and during the reduced scoring period complicate matters.

Not that you suggested otherwise, but storing the scores and timestamps should be enough. Storing the actual answers they tried to enter feels misguided to me. The underlying problem code could change, displaying a totally new problem version and their actual old attempts then become irrelevant.

But conversely, the problem code could change to fix a bug in an error checker, and then having the old answer attempts would be valuable to "rescore" old attempts. (That is a MyOpenMath feature, by the way.) So maybe it is worth keeping them after all. It could be reasonable to cap it at only the most recent 30 attempts, or something like that?

Now if only there were version control on every single problem's code....

Alex-Jordan commented 2 years ago

Another issue that came up with a student yesterday. When using the manual grader to assign a score (say because of a syntax issue or a buggy problem) the current scheme expects the instructor to be conscious of when the student attempts happened and when the reduced scoring starts/started. Not thinking about this, I gave a student 100% on something that should have been a reduced score.

drgrice1 commented 2 years ago

@Alex-Jordan: I think this is a simple one to fix. We should just add a message to the problem grader that the set is in the reduced scoring period and to account for that when assigning the grade. I do not think that the problem grader should apply the reduced scoring value to the grade the instructor chooses. An instructor could mean to be actually giving the grade they enter and intentionally mean to override the reduced scoring grade.

drgrice1 commented 2 years ago

In other words, expect the instructor to be aware of the reduced scoring, but make the instructor aware.