Closed jgstew closed 1 year ago
it might be best to check that the UTF8 BOM is present:
import codecs
encoding = "Unknown"
required_bom = getattr(codecs, "BOM_UTF8")
with open(file_path, "rb") as file:
header = file.read(len(required_bom))
if header.startswith(required_bom):
encoding = "utf-8-bom"
most of the files I care about don't have a BOM, so this doesn't actually work: https://github.com/jgstew/tools/blob/master/Python/file_check_bom.py
See here: https://stackoverflow.com/a/3269323/861745
The idea is to ensure that files that should be UTF8 actually are UTF8 even if those files should generally only contain ascii characters, which is a different check.