rcpch / national-paediatric-diabetes-audit

A django application to audit the care of children and young people with diabetes in England and Wales.
0 stars 1 forks source link

Incorrect CSV cell format errors crash entire upload #361

Closed mbarton closed 2 weeks ago

mbarton commented 2 weeks ago

Edit dummy_sheet.csv to have two rows of data. Make the following edits:

Upload. You should get a validation error but instead you get a crash:

django-1   |     | Traceback (most recent call last):
django-1   |     |   File "/usr/local/lib/python3.12/site-packages/django/db/models/fields/__init__.py", line 2140, in to_python
django-1   |     |     return int(value)
django-1   |     |            ^^^^^^^^^^
django-1   |     | ValueError: invalid literal for int() with base 10: 'TRUE'
django-1   |     | 
django-1   |     | During handling of the above exception, another exception occurred:
django-1   |     | 
django-1   |     | Traceback (most recent call last):
django-1   |     |   File "/app/project/npda/general_functions/csv_upload.py", line 252, in validate_rows
django-1   |     |     visits = rows.apply(
django-1   |     |              ^^^^^^^^^^^
django-1   |     |   File "/usr/local/lib/python3.12/site-packages/pandas/core/frame.py", line 10374, in apply
django-1   |     |     return op.apply().__finalize__(self, method="apply")
django-1   |     |            ^^^^^^^^^^
django-1   |     |   File "/usr/local/lib/python3.12/site-packages/pandas/core/apply.py", line 916, in apply
django-1   |     |     return self.apply_standard()
django-1   |     |            ^^^^^^^^^^^^^^^^^^^^^
django-1   |     |   File "/usr/local/lib/python3.12/site-packages/pandas/core/apply.py", line 1063, in apply_standard
django-1   |     |     results, res_index = self.apply_series_generator()
django-1   |     |                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
django-1   |     |   File "/usr/local/lib/python3.12/site-packages/pandas/core/apply.py", line 1081, in apply_series_generator
django-1   |     |     results[i] = self.func(v, *self.args, **self.kwargs)
django-1   |     |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
django-1   |     |   File "/app/project/npda/general_functions/csv_upload.py", line 253, in <lambda>
django-1   |     |     lambda row: (validate_visit_using_form(patient_form.instance, row), row["row_index"]),
django-1   |     |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
django-1   |     |   File "/app/project/npda/general_functions/csv_upload.py", line 196, in validate_visit_using_form
django-1   |     |     fields = row_to_dict(
django-1   |     |              ^^^^^^^^^^^^
django-1   |     |   File "/app/project/npda/general_functions/csv_upload.py", line 150, in row_to_dict
django-1   |     |     model_field: csv_value_to_model_value(
django-1   |     |                  ^^^^^^^^^^^^^^^^^^^^^^^^^
django-1   |     |   File "/app/project/npda/general_functions/csv_upload.py", line 144, in csv_value_to_model_value
django-1   |     |     return model_field.to_python(value)
django-1   |     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
django-1   |     |   File "/usr/local/lib/python3.12/site-packages/django/db/models/fields/__init__.py", line 2142, in to_python
django-1   |     |     raise exceptions.ValidationError(
django-1   |     | django.core.exceptions.ValidationError: ['“TRUE” value must be an integer.']
eatyourpeas commented 2 weeks ago

In the PR above dtype is now set at the time the dataframe is created. I did not do this as wanted to discuss here first but it would be possible at this point to raise an exception at this early stage if this dtype allocation process fails, such as in this situation. It was occurring because the new csv generation function was randomising booleans for the closed loop measure, rather than choices.

If the csv contains the wrong types (eg True/False in this case, rather than the expect int associated with the choice), should we just reject the whole csv maybe with a useful message to say please fix and try again?