ustaxcourt / ef-cms

An Electronic Filing / Case Management System.
https://dawson.ustaxcourt.gov/
Other
85 stars 46 forks source link

10001 (and 10002 and 10003) Design Debt: More Granular File Validation Errors #5365

Closed Mwindo closed 1 week ago

Mwindo commented 2 weeks ago

TL;DR: When something goes wrong with a (typically PDF) file upload, the user doesn't know until we try to parse it on the backend, at which point we only display a generic error. This means 1) the error is often not specific enough to be helpful and 2) the error might come at the end of a long form rather than as soon as the user selects the file. This PR addresses the issue.

This should free up the court's time addressing frequent help tickets (multiple per day) regarding failed file uploads. It will also prevent corrupted/invalid files from being uploaded to S3.


N.B.: The PR diff is not as intimidating as it looks. A lot of the diff is refactoring.

Tickets: 10001 (password-protected PDFs), 10002 (invalid files types), 10003 (invalid/corrupted PDFs). (But note that this PR is best summarized as "implement client-side file validation." )

Figma here.

The Approach in This PR

In order to provide the user immediate feedback (rather than waiting until they try to submit), we use pdf-js client-side on file selection to check for issues. If issues are discovered--e.g., the PDF is password-protected or the wrong file type--we show a modal with an error message and clear the file selection. (While validating--which is usually very quick--we display a loading overlay.) This sidesteps difficulties with joi document validation.

In addition, we send an error message to the backend for logging. This error message is slightly decoupled from what is shown to the user in order for us to have finer-grained control over what is logged. (E.g., while we may display the same generic error message for multiple types of file parsing errors, we log them more granularly for debugging/prod-support purposes.)

While the front-end messaging has been overhauled, this PR does not change our back-end PDF validation implementation at all. In other words, the client-side validation is there for convenience to filter out bad documents, but we still validate on the back end.

The validation has been added to (almost) everywhere we upload files. (I was told to ignore PractitionerAddEditDocumentation, which might be changed soon, and PetitionQcScanBatchPreview, which, despite some code for uploading, doesn't seem to actually allow uploads.)

Work left to do

File selection components can be refactored. I think this is probably worth attacking in a separate unit of work.

Examples of New Error Messages

https://github.com/user-attachments/assets/9eb7cdc7-67b5-4da5-87a0-08ad36ccd7b7