inveniosoftware / invenio-app-rdm

Turn-key research data management platform.
https://inveniordm.docs.cern.ch
MIT License
109 stars 149 forks source link

Ability to show file upload related backend errors to the user during submission. #2909

Open hbayindir opened 1 week ago

hbayindir commented 1 week ago

Is your feature request related to a problem? Please describe.

We have implemented a file validation procedure for uploaded files in our fork of invenio-files-rest package to enhance user security and improve sustainability of the uploaded data in our local instance, Aperta. In short, we make sure that the files are in the formats we like, there are no nasty surprises inside, and files are intact and openable.

However, when a user encounters an error, or rejected by one of these filters, we can't notify them about what happened.

This feature is a carryover from our old Aperta installation, which was a fork of Zenodo, and we were able to show these notifications on that version.

Describe the solution you'd like

We want to be able to notify the user about problems in their file(s), and what failed. So they can fix the problem and send the correct version/alternative of the file instead.

Describe alternatives you've considered

Since there's no way to communicate these errors and float them towards the UI and the user, we were unable to find any other alternatives.

Additional context

The things we do are as follows:

  1. The process starts with file extension verification, which is performed through magic byte inspection, utilizing the open-source filetype library to infer file types.
  2. We also perform health checks on tar (tar, tar.gz, tar.bz2, tar.xz) and zip compressed files to detect any issues during extraction. Rar and 7z formats are not supported; rar is a proprietary format, and reliable health checks for both formats would require third-party libraries, which we aim to avoid.
  3. After health check, we verify the total uncompressed size not exceed 10 GiB. This limit is in place to protect against unusually large zip bombs, further ensuring end-user security. Password protected compressed files rejected as well due security and policy reasons.

If files pass all these checks, they're accepted as valid download, otherwise we reject them with an error. For a detailed explanation of what we do, please see the flowchart added below.

The file validation checks are executed once the file has fully uploaded, allowing us to perform integrity checks on compressed formats. If any validation fails, the file is immediately deleted from storage, and a FileCheckError —a custom exception inheriting from StorageError— is raised to indicate the failure.

This is where the problem surfaces. Unfortunately there is no support in InvenioRDM for showing this kind custom upload errors in UI. To be able to inform the user about the problem in a detailed manner, we want these errors to be visible in the UI, and want to introduce this feature into the file uploader(s) InvenioRDM has. There're talks about a new file uploader, and if that one will be the default, we'd love to add the support for it. Otherwise we want to introduce the feature to the old one and hopefully backport it to the newer one in due course.

Any help and guidance is greatly appreciated.

Flowchart:

flowchart TD
        flow_start@{ shape: circle, label: "Start" }
        flow_end@{ shape: circle, label: "End" }
        flow_end_end@{ shape: circle, label: "End" }
        upload@{ shape: lean-r, label: "Upload file" }
        is_not_allowed@{ shape: diamond, label: "if file_ext in [\"7z\", \"rar\"]" }
        file_ext@{ shape: rectangle, label: "file_ext = filename.split('.')[-1]" }
        file_check_error@{ shape: rectangle, label: "raise FileCheckError" }
        file_ext_guess@{ shape: rectangle, label: "guessed_file_ext = file_ext_guess(fp)" }
        is_file_ext_matched@{ shape: diamond, label: "if guessed_file_ext == file_ext" }
        is_zip@{ shape: diamond, label: "file_ext == \"zip\"" }
        is_tar@{ shape: diamond, label: "if file_ext in [\"tar\", \"tar.gz\", \"tar.bz2\", \"tar.xz\"]" }
        zip_integrity_test@{ shape: rectangle, label: "test_zip_file(fp)" }
        tar_integrity_test@{ shape: rectangle, label: "test_tar_file(fp)" }
        get_zip_uncompressed_size@{ shape: rectangle, label: "uncompressed_size = get_zip_uncompressed_size(fp)" }
        get_tar_uncompressed_size@{ shape: rectangle, label: "uncompressed_size = get_tar_uncompressed_size(fp)" }
        is_large_zip@{ shape: diamond, label: "uncompressed_size > 10 GB" }
        is_large_tar@{ shape: diamond, label: "uncompressed_size > 10 GB" }

        flow_start --> upload
        upload --> file_ext
        subgraph check_file["check_file(fp, filename)"]
        file_ext --> is_not_allowed
        is_not_allowed --True--> file_check_error
        is_not_allowed --False--> file_ext_guess
        file_ext_guess --> is_file_ext_matched
        is_file_ext_matched --True-->is_zip
        is_file_ext_matched --False-->file_check_error
        is_zip --True--> zip_integrity_test
        is_zip --False-->is_tar
        is_tar--True--> tar_integrity_test
        is_tar--False--> flow_end
        subgraph check_zip_file["check_zip_file(fp)"]
        zip_integrity_test --> get_zip_uncompressed_size
        get_zip_uncompressed_size --> is_large_zip
        end
        is_large_zip --True--> file_check_error
        is_large_zip --False--> flow_end
        subgraph check_tar_file["check_tar_file(fp)"]
        tar_integrity_test --> get_tar_uncompressed_size
        get_tar_uncompressed_size --> is_large_tar
        end
        is_large_tar --True--> file_check_error
        is_large_tar --False--> flow_end
        end
        check_file --> flow_end_end

P.S.: You can ping me and @geekdinazor about this issue. As I aforementioned, we want to implement and contribute this, if accepted. We may need some help from you.

ntarocco commented 1 day ago

Unfortunately there is no support in InvenioRDM for showing this kind custom upload errors in UI

What is the exact issue here? Are you running this validation asynchronously? Are you expecting to respond with an error to the upload request?