inveniosoftware / invenio-previewer

Invenio module for previewing files.
https://invenio-previewer.readthedocs.io
MIT License
5 stars 60 forks source link

Add CSV max filesize check (#190) #191

Open daglueck opened 1 year ago

daglueck commented 1 year ago

:heart: Thank you for your contribution!

Description

Fixes #190 by means of adding a file size check to invenio_previewer.extensions.csv_papaparsejs.can_preview.

Checklist

Ticks in all boxes and 🟢 on all GitHub actions status checks are required to merge:

Third-party code

If you've added third-party code (copy/pasted or new dependencies), please reach out to an architect.

Reminder

By using GitHub, you have already agreed to the GitHub’s Terms of Service including that:

  1. You license your contribution under the same terms as the current repository’s license.
  2. You agree that you have the right to license your contribution under the current repository’s license.
daglueck commented 1 year ago

For the sake of simplicity I've added the file size check to can_preview rather than adding a separate function like validate_csv.

I've also used 1 * 1024 * 1024 as fallback size (rather than 10 * 1024 * 1024 like in the old CSV previewer) since that is the default value of PREVIEWER_MAX_FILE_SIZE_BYTES

lnielsen commented 11 months ago

@psaiz Question: The new CSV file previewer is supposed to stream the CSV file so it's not loading the full file right?

@daglueck Given the answer from @psaiz I would say that the limit should be significantly higher to allow for larger CSV files to be previewed - say e.g. 100MB.

daglueck commented 11 months ago

Based on your comments, I've added a separate validation function (@alejandromumo ) and increased the max file size to 100 * 1024 * 1024 (@lnielsen ). I've added a new config variable PREVIEWER_CSV_MAX_BYTES to differentiate CSV from JSON/XML. Let me know if you had imagined this in a different way.

@lnielsen I didn't know that a new csv previewer was added. How does it work with the max file size, will it simply not load the file if it's too big (as with XML/JSON I guess) or will it truncate the file (like with TXT)? I was assuming here it won't load the file since otherwise there might be parsing issues. If on the other hand it does truncate the file, the wording of PREVIEWER_CSV_MAX_BYTES might need to be adjusted.