jupyter / nbgrader

A system for assigning and grading notebooks
https://nbgrader.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
1.28k stars 316 forks source link

Duplicating cells results in autograding failure because of duplicate IDs, and confusing error message #1083

Open rkdarst opened 5 years ago

rkdarst commented 5 years ago

This is similar to #981, as in that may solve the problem, but I'll document core problem here and a workaround.

Students have submitted notebooks in which they have copied cells. This duplicates the metadata: of importance here is "nbgrader": {"grade_id": ...}. When this happens, the following error message is given:

[AutogradeApp | WARNING] Cell with id 'Task_3_1_test' exists multiple times!
...
[AutogradeApp | WARNING] Removing failed assignment: /m/jhnas/jupyter/course/mlpython2019/files/autograded/staafv1/R1_Introduction
[AutogradeApp | ERROR] One or more notebooks in the assignment use an old version of the nbgrader metadata format. Please **back up your class files directory** and then update the metadata using: nbgrader update .

The first warning is the key to the problem. However, the error message shows that nbgrader metadata should be updated, which not the real problem here (though I haven't tested it if actually works). At least, the error message could be improved or the failure could be noticed earlier.

It isn't a perfect solution, but it would be nice if nbgrader could also continue on in this case. Clearly the student has done something weird (and notebook/jupyterlab could solve it by not duplicating metadata). But could nbgrader also survive this and make some choice one way or the other itself, considering that duplicating metadata is probably a type of problem that will appear again? That way the common case won't cause a failure stopping all autograding... and we leave it to instructors to tell of the problem later.

Who knows what the right nbgrader-only solution would be - simple would be to pick the last cell with the metadata. But many corner cases could break this.

Workaround for anyone else who gets this problem: My tests showed that you have to delete the whole nbgrader metadata dict from the notebook cell - changing the grade_id is not enough because it can't find that grade_id in the database, nor is removing just the grade_id cell.

Operating system

Linux

nbgrader --version

Modified version, based on upstream d56f75b (close to current master).

jhamrick commented 5 years ago

Hmm, I am pretty sure it was the case that in the notebook when you copy a cell it does not copy the metadata. Are your students by chance either using JupyterLab (which I think may not implement this feature), or a very old version of the notebook? It's also possible there's been a regression.

In any case, I agree it would be nice if nbgrader handled this more gracefully. Certainly the error message could be better at least. I'll look into options for how to handle this better and report back.

rkdarst commented 5 years ago

In our provided hub, courses use Notebook by default (5.7.2 it seems we are using now). But Lab is available, and the way to switch between them is documented but not. One previous course also instructed their students to download notebooks and move them to another service for actually doing the assignments - who knows what errors could be inserted in this process!

I'm not an instructor and don't interact with the students, so don't know that much about what students are actually doing and can't ask them questions.

Random thought: are the old values of the cells re-inserted during the fetch stage or the autograde stage? Because in both of the cases where I saw this, a cell was duplicated but both of the sources were the same. I wonder if this is a coincidence (they dupilacated cells and didn't edit them) or by design (same cell IDs are updated with the same contents, meaning it really doesn't matter if a duplicate is dropped).

jhamrick commented 5 years ago

Hmm, I would expect your version of the notebook is new enough (it's a really old version that doesn't support it). Maybe they're using JupyterLab or maybe there's been a regression 😕

Regarding your last question, the answer depends on the cell type I think... for read-only cells (including cells with tests in them) this is the intended behavior in case students manage to edit them somehow, and so I wouldn't be surprised if it just worked with the duplicated cells too. But for other types of cells that have nbgrader ids (e.g. duplicate answer cells) it shouldn't happen that they have the same source, unless the student copied the source.

elesiuta commented 5 years ago

I experienced this issue as well in the class I recently TA’d (nbgrader 0.5.4 on jupyterhub 0.9.4). My solution was a script that merges cells when it detects duplicate grade_ids by concatenating the cell’s source to the first then removing the duplicate.

I uploaded the script as part of a small collection I wrote for that course in case it’s of use to anyone https://github.com/elesiuta/jupyter-nbgrader-helper (run with --fix AssignName NbName.ipynb). As an aside, if there’s any feature in there you think might be useful to implement properly into nbgrader I’d be happy to open a pull request and work on it.

akskuchi commented 4 years ago

This also happened when I used the Jupyter notebook split cell editing (CTRL + SHIFT + -). The new cell created as a result of the split has the same cell-ID (as the original cell). Having the same cell-ID for two cells prevents me from saving the notebook. Annoying!

Likely something to do with this - #758?

fredcallaway commented 4 years ago

I'm having the same problem. You can accidentally copy a read-only cell and then you can't delete it. This causes nbgrader validate to crash down the road.

jupyPython version 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0] nbgrader version 0.6.0

jhamrick commented 4 years ago

I can confirm this is an issue, and it sounds like it is a regression in the notebook as this was not a problem previously.

Sefriol commented 4 years ago

This also causes problems with the feedback system, since Instructor needs to manually fix the notebook, i.e. it's hash changes which is used to distribute the feedback form.

jhamrick commented 4 years ago

It sounds like this is potentially when the regression happened, at least in the context of splitting cells: https://github.com/jupyter/notebook/pull/2349

I am not really sure what the best way to deal with this is. It sounds like there are other extensions that rely on the metadata being copied, but we need it not to be copied because otherwise we can't tell which cell is the "right" one, and there's no way for us to modify information like the cell id when it is copied. I will have to think about what the right way to deal with this is...

trevorcampbell commented 4 years ago

Just adding a +1 to this -- I ran in to this exact issue in my course just now!

nthiery commented 3 years ago

Same here; our students can work on their assignments in two setups, so I can't guarantee the exact software versions they used, but it should be pretty similar to:

jupyter --version
jupyter core     : 4.7.1
jupyter-notebook : 6.3.0
qtconsole        : 5.0.3
ipython          : 7.21.0
ipykernel        : 5.5.0
jupyter client   : 6.1.12
jupyter lab      : not installed
nbconvert        : 5.6.1
ipywidgets       : 7.6.3
nbformat         : 5.1.2
traitlets        : 4.3.3

Luckily our setup let me access the student's assignments, so I could remove the duplicated cell.

nthiery commented 1 year ago

Just as a data point: the issue is still there with JupyterLab 3.6.0 and nbgrader 0.8.1.

vahtras commented 7 months ago

I have encountered this issue before and see it again in early 2024. Students use an installation based on the latest miniconda

As in the original post, it arises when students duplicate a solution cell for experimentation and leave it in the submitted version. I would favor a solution where duplicate cells are ignored and the first is used for auto-grading. That would reduce a lot of manual work.

Regards Olav

nthiery commented 7 months ago

After being itched by this problem for a long time, I believe that the proper solution would be to guarantee that, when a cell with nbgrader metadata is copied by the student, the metadata of that cell is not copied with it. Achieving this would require just a small Jupyter extension that:

And a small change in nbgrader to set the attribute copy_metadata on all cells with cell id's.

What do you think?

perllaghu commented 7 months ago

Are you thinking that the flag is a global toggle, or settable at notebook-level [eg, in the notebook metadata, for example]

As a general idea, yes.... in fact - what does get copied? Are the outputs copied too?

rkdarst commented 7 months ago

1753 was a fix by someone working with me last year. @tuncbkose did a lot of testing, and in the end I don't remember exactly what was found... but I think it turned out that most types of duplications didn't matter that much and could basically be ignored. (my reasoning: if it wouldn't be replaced with the original contents, it doesn't matter. If it would, things like autograder tests would hopefully be idempotent so it's OK if they run twice, and manually graded stuff can be dealt with by a human). At least maybe in our experience?

perllaghu commented 7 months ago

... but I think it turned out that most types of duplications didn't matter that much and could basically be ignored.

My experience is that there are occasionally times where students copy grader cells, and those cells do break the autograder.

It happens sufficiently often that I wrote some notes on fixing it: https://noteable.edina.ac.uk/documentation/notebook-recovery/

tuncbkose commented 7 months ago

I think what @rkdarst may have meant is that beyond beyond breaking the autograder (which just tracks all the encountered grade_ids and stops if it sees one again), duplicate cells don't cause much harm: if you could skip such cells/notebooks, most things would be fine (at least in our use of nbgrader). So that is what #1753 does: alert the user if something seems to be duplicated but otherwise skip it and move on with the rest of the autograding.

nthiery commented 7 months ago

About setting copy_metadata at cell-level versus notebook-level: a priori I had in mind to follow the path of the existing 'deletable' and 'editable` flags: e.g. cell-level only. But practice may later suggest to also enable global metadata.

nthiery commented 7 months ago

About #1753 : I am really glad someone worked on that issue, and I have been meaning to test it for a long time. That being said, even if it works well in practice, it feels like a workaround for fixing a posteriori what should have never happened in the first place: that nbgrader cells are duplicated with there metadata when the end users really just meant to copy their content.

vahtras commented 7 months ago

I totally agree on @nthiery 's suggestion. Remove all metadata for copied cells. That would solve everything.

rkdarst commented 7 months ago

The thing with duplicating cells: how do we know which should have the original content? first or second? what if content was split among two cells? What if duplicated and the "original" was later deleted, maybe because it was used as a test or something? Or original was modified to be come the "copy"

... but actually, according to my theory, it shouldn't matter! If you can ignore duplicate metadata, hopefully you can ignore these problems too. I would really be interested in an analysis of when it actually matters.

Maybe docs should be updated to warn to add metadata in the least number of possible cases, and try to make it obvious to not duplicate them - even if we do other things?

lahwaacz commented 7 months ago

Remove all metadata for copied cells. That would solve everything.

You may not want to remove all metadata e.g. when creating an assignment, then you would have to set all attributes like the nbgrader's cell type and points from scratch. It is safe to only remove metadata that are causing problems, i.e. the IDs.

Thinking about this more: you can actually create and release an assignment that has duplicate IDs from the start.

It would be nice if nbgrader showed warnings on certain actions like releasing an assignment or submitting a solution, and asked the user to either confirm or abort the action. Along with suggesting steps to manually resolve the issue (e.g. create an empty cell next to the bad cell, copy-pasting the content, and deleting the bad cell), this would help a lot without the need to invent a convoluted solution to prevent/workaround the issue automatically.

nthiery commented 7 months ago

You may not want to remove all metadata e.g. when creating an assignment, then you would have to set all attributes like the nbgrader's cell type and points from scratch. It is safe to only remove metadata that are causing problems, i.e. the IDs.

Yes, the idea is not to change how cells behave when editing the assignment sources. Like the currently used read_only flag the remove_cell_metadata flag would be set by nbgrader on the student version.

vahtras commented 7 months ago

I don't know if it belongs here, but a related issue is when you split a cell then they will also get the same metadata. At least one gets a warning immediately

Duplicate nbgrader cell ID
The nbgrader ID "cell-6ba63f0e4a4f7961" has been used for more than one cell. Please make sure all grade cells have unique ids.
nthiery commented 7 months ago

@vahtras: yeah, that's an annoyance too; presumably harder to treat as for copied cell. Luckily, most of our students don't know how to split cells, so that's only an annoyance for instructors when editing assignments.

mark-naylor commented 6 months ago

Hey. This issue with students copying grading cells is the most common fail for auto grading I have.

I think that making these cells not copyable would be the right solution to encourage the correct behaviour.

If they want new cell they should insert them.