mikeckennedy / python-jumpstart-course-demos

Contains all the "handout" materials for my Python Jumpstart by Building 10 Apps course. This includes try it yourself and finished versions of the 10 apps.
https://talkpython.fm/course
MIT License
746 stars 536 forks source link

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 304: invalid start byte #47

Closed mikeckennedy closed 4 years ago

mikeckennedy commented 4 years ago

In the file searcher app, users have run into this error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 304: invalid start byte

The problem here is there is a binary file that is being fed to the read text file option (open(file, 'r')). Here is the fix:

def search_file(filename, search_text):

    # NOTE: We haven't discussed error handling yet, but we
    # cover it shortly. However, some folks have been running
    # into errors where this is passed a binary file and crashes.
    # This try/except block catches the error and returns no matches.
    try:

        # matches = []
        with open(filename, 'r', encoding='utf-8') as fin:

            line_num = 0
            for line in fin:
                line_num += 1
                if line.lower().find(search_text) >= 0:
                    m = SearchResult(line=line_num, file=filename, text=line)
                    # matches.append(m)
                    yield m

            # return matches
    except UnicodeDecodeError:
        print("NOTICE: Binary file {} skipped.".format(filename))
mikeckennedy commented 4 years ago

Please note that you might think you're not passing binary files and yet are. For example, macOS has a hidden .DS_Store file which is binary in many folders.