jupyter / nbgrader

A system for assigning and grading notebooks
https://nbgrader.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
1.28k stars 316 forks source link

Not importing emails when Importing students from .csv #890

Closed ncclementi closed 6 years ago

ncclementi commented 7 years ago

Operating system

Ubuntu 16.04

nbgrader --version

nbgrader version 0.5.2

Expected behavior

I'm creating the students database from a csv file. My csv files has: id,last_name,first_name,email so I expect when I do nbgrader db student list to have all the fields for each student.

Actual behavior

For some reason the field corresponding to email it's not being import. For example in the csv file I have: id,last_name,first_name,email stud1, smith, blabla, bla@mail.edu

and when creating the database I got:

Creating/updating student with ID 'stud1': {'last_name': 'smith', 'first_name': 'blabla'}

when listing students I see:

stud1 (smith, blabla) -- None

Any clue what is going on?

lgpage commented 7 years ago

@ncclementi that is very strange. I also can't recreate this behaviour.

$ cat students.csv
id,last_name,first_name,email
stud1, smith, blabla, bla@mail.edu

$ nbgrader db student import students.csv
[DbStudentImportApp | INFO] Importing students from: 'students.csv'
[DbStudentImportApp | INFO] Creating/updating student with ID 'stud1': {'first_name': ' blabla', 'last_name': ' smith', 'email': ' bla@mail.edu'}

$ nbgrader db student list
There are 1 students in the database:
stud1 ( smith,  blabla) --  bla@mail.edu

The only thing I can think that would cause this behaviour is if the email heading is misspelled or if there are spaces in the heading row, for example (note the space before email):

$ cat students.csv
id,last_name,first_name, email
stud1, smith, blabla, bla@mail.edu

$ nbgrader db student import students.csv 
[DbStudentImportApp | INFO] Importing students from: 'students.csv'
[DbStudentImportApp | INFO] Creating/updating student with ID 'stud1': {'first_name': ' blabla', 'last_name': ' smith'}

I think spaces in general (before or after the comma) could lead to unexpected behaviour.

lgpage commented 7 years ago

We should in any case display a warning if the heading key is wrong to give better feedback to the user (https://github.com/jupyter/nbgrader/blob/master/nbgrader/apps/dbapp.py#L152)

ncclementi commented 7 years ago

@lgpage Double checking my .csv I don't have spaces before but there is on afterwards. Not sure if that is the problem. I can't test it right know since that would implied messing up with the data base I'm already using, but I'll try in a different computer and report what happens.

ixjlyons commented 7 years ago

Can confirm that whitespace after a header item in the CSV will cause the silent failure (check by adding a space after first_name on this line). @ncclementi is the trailing space you're referring to in the email header cell? If so, it looks like that's what's going on.

Issuing a warning any time a key isn't in allowed_keys is simple but it might be a bit noisy if you're importing from a generated file from a LMS or something (potentially contains lots of extra columns). Could instead strip leading/trailing whitespace when reading the CSV, but I'm not sure that's always desired.

ncclementi commented 7 years ago

@ixjlyons Yes, I was referring to that trailing space, well at least we know why!. Thanks

lgpage commented 6 years ago

Issuing a warning any time a key isn't in allowed_keys is simple but it might be a bit noisy if you're importing from a generated file from a LMS or something (potentially contains lots of extra columns).

Good point. It should still be possible to do a single warning (or debug) log of incorrect/extra keys that were ignored at the end of importing the data.

jhamrick commented 6 years ago

[Catching up on issues and PRs as I've been away for the last two months]

Yeah, I think it should be not too bad to detect all the extra keys and give one log message. We can definitely also strip whitespace; that seems like a reasonable thing to do.

mpacer commented 6 years ago

Cleanest way to solve this for both importing students and importing assignments was to create a generic superclass.

That superclass requires you to specify the sqlalchemy class from the api module that it updates as one of its attributes. For DbStudentImportApp, that is Student.
For DbAssignmentImportApp, that is Assignment.

This also involves specifying the method used for the update on the Gradebook update_or_create_student for Student, update_or_create_assignment for Assignment) and the key used in that method to identify the particular instance in the table ('id' for Student, 'name' for Assignment).