hark130 / living_manual

LIVING MANUAL (LIMA) is a "dirty word" search utility written in Python3.
GNU General Public License v3.0
0 stars 0 forks source link

LIMA-5: Non-UTF-8 encoded file BUG #7

Closed hark130 closed 2 years ago

hark130 commented 2 years ago

When LIMA tries to read a file that isn't UTF-8 encoded, lima.lima_search.search_file() raises a UnicodeDecodeError Exception. Something like: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 10: invalid continuation byte in some.pdf

For the time being, I'll set LIMA to print an error to stderr but continue operation. Meanwhile, probably transition search_file() to switch to a .read_bytes() call inside the except UnicodeDecodeError and then "try harder" by converting the dirty word strings to byte objects(?).