Open mjordan opened 2 months ago
This also applies to media track files.
It would be good to add to validate files as utf8, both in --check
and non-check. Maybe provide a config setting so users can decide which files (based on media use tid?) are validated.
Within the
create_media()
function, extracted text data must be encoded as utf-8:If it is not, line 5143 produces exceptions like
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 922: invalid start byte
.Short-term fix is to catch this error and not load the text.