TL;DR: When something goes wrong with a (typically PDF) file upload, the user doesn't know until we try to parse it on the backend, at which point we only display a generic error. This means 1) the error is often not specific enough to be helpful and 2) the error might come at the end of a long form rather than as soon as the user selects the file. This PR addresses the issue.
This should free up the court's time addressing frequent help tickets (multiple per day) regarding failed file uploads. It will also prevent corrupted/invalid files from being uploaded to S3.
N.B.: The PR diff is not as intimidating as it looks. A lot of the diff is refactoring.
Tickets: 10001 (password-protected PDFs), 10002 (invalid files types), 10003 (invalid/corrupted PDFs). (But note that this PR is best summarized as "implement client-side file validation." )
In order to provide the user immediate feedback (rather than waiting until they try to submit), we use pdf-js client-side on file selection to check for issues. If issues are discovered--e.g., the PDF is password-protected or the wrong file type--we show a modal with an error message and clear the file selection. (While validating--which is usually very quick--we display a loading overlay.) This sidesteps difficulties with joi document validation.
In addition, we send an error message to the backend for logging. This error message is slightly decoupled from what is shown to the user in order for us to have finer-grained control over what is logged. (E.g., while we may display the same generic error message for multiple types of file parsing errors, we log them more granularly for debugging/prod-support purposes.)
While the front-end messaging has been overhauled, this PR does not change our back-end PDF validation implementation at all. In other words, the client-side validation is there for convenience to filter out bad documents, but we still validate on the back end.
The validation has been added to (almost) everywhere we upload files. (I was told to ignore PractitionerAddEditDocumentation, which might be changed soon, and PetitionQcScanBatchPreview, which, despite some code for uploading, doesn't seem to actually allow uploads.)
Work left to do
File selection components can be refactored. I think this is probably worth attacking in a separate unit of work.
TL;DR: When something goes wrong with a (typically PDF) file upload, the user doesn't know until we try to parse it on the backend, at which point we only display a generic error. This means 1) the error is often not specific enough to be helpful and 2) the error might come at the end of a long form rather than as soon as the user selects the file. This PR addresses the issue.
This should free up the court's time addressing frequent help tickets (multiple per day) regarding failed file uploads. It will also prevent corrupted/invalid files from being uploaded to S3.
N.B.: The PR diff is not as intimidating as it looks. A lot of the diff is refactoring.
Tickets: 10001 (password-protected PDFs), 10002 (invalid files types), 10003 (invalid/corrupted PDFs). (But note that this PR is best summarized as "implement client-side file validation." )
Figma here.
The Approach in This PR
In order to provide the user immediate feedback (rather than waiting until they try to submit), we use pdf-js client-side on file selection to check for issues. If issues are discovered--e.g., the PDF is password-protected or the wrong file type--we show a modal with an error message and clear the file selection. (While validating--which is usually very quick--we display a loading overlay.) This sidesteps difficulties with joi document validation.
In addition, we send an error message to the backend for logging. This error message is slightly decoupled from what is shown to the user in order for us to have finer-grained control over what is logged. (E.g., while we may display the same generic error message for multiple types of file parsing errors, we log them more granularly for debugging/prod-support purposes.)
While the front-end messaging has been overhauled, this PR does not change our back-end PDF validation implementation at all. In other words, the client-side validation is there for convenience to filter out bad documents, but we still validate on the back end.
The validation has been added to (almost) everywhere we upload files. (I was told to ignore
PractitionerAddEditDocumentation
, which might be changed soon, andPetitionQcScanBatchPreview
, which, despite some code for uploading, doesn't seem to actually allow uploads.)Work left to do
File selection components can be refactored. I think this is probably worth attacking in a separate unit of work.
Examples of New Error Messages
https://github.com/user-attachments/assets/9eb7cdc7-67b5-4da5-87a0-08ad36ccd7b7