talkpython / mastering-pycharm-course

Course demos and handouts for Talk Python's Effective PyCharm course
https://training.talkpython.fm/courses/explore_pycharm/mastering-pycharm-ide
GNU General Public License v2.0
1.12k stars 716 forks source link

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 304: invalid start byte #30

Closed mikeckennedy closed 4 years ago

mikeckennedy commented 4 years ago

In the file searcher app, users have run into this error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 304: invalid start byte

The problem here is there is a binary file that is being fed to the read text file option (open(file, 'r')). Here is the fix:

def search_file(filename, search_text):

    # NOTE: We haven't discussed error handling yet, but we
    # cover it shortly. However, some folks have been running
    # into errors where this is passed a binary file and crashes.
    # This try/except block catches the error and returns no matches.
    try:

        # matches = []
        with open(filename, 'r', encoding='utf-8') as fin:

            line_num = 0
            for line in fin:
                line_num += 1
                if line.lower().find(search_text) >= 0:
                    m = SearchResult(line=line_num, file=filename, text=line)
                    # matches.append(m)
                    yield m

            # return matches
    except UnicodeDecodeError:
        print("NOTICE: Binary file {} skipped.".format(filename))
mikeckennedy commented 4 years ago

Whoops, wrong course repo :)