Closed dfdan closed 1 year ago
We've experienced the same issue with txt files https://github.com/inveniosoftware/invenio-app-rdm/issues/1864, and I second the suggestion to default to utf8 instead of asci. The previewers themselves are trying to default to utf8, so detect_encoding should respect that default.
Package version (if known): (current)
Describe the bug
JSON files containing UTF-8 with only sparse unicode characters are not reliably detected as such. This happens if the first unicode character doesn't occur until > PREVIEWER_CHARDET_BYTES bytes (only 1k by default) -
https://github.com/inveniosoftware/invenio-previewer/blob/fc3e5d2656d7f503ee6d393567d6b1396fbf37db/invenio_previewer/utils.py#L27
Steps to Reproduce
Expected behavior
Sparsley unicode containing files should preview correctly.
I suggest that we fix utils.py to override ascii detection with utf-8 - this should be safe?