Requires that text files have ascii-encoding, including the
extended ascii set.
This is useful to detect files that have unicode characters.
require-ascii will fail on files that are encoded in extended ASCII if:
the file uses characters in the 128–255 range, and
those characters aren’t followed by other characters that coincidentally make the sequence valid UTF-8 (see this table).
This script will generate a bunch of files that contain valid extended ASCII but fail when tested by require-ascii:
# The README links to <https://theasciicode.com.ar/>. There's many different
# ways you could extend ASCII, but that site in particular says "In 1981,
# IBM developed an extension of 8-bit ASCII code, called 'code page 437'..."
extended_ascii = "cp437"
for code_point in range(128, 256):
# Create a file that should pass require-ascii, but won't.
with open(f"{code_point}.cp437.txt", mode='wb') as file:
file.write(code_point.to_bytes(1, 'little'))
# Make sure that that file really does contain valid extended ASCII.
with open(f"{code_point}.cp437.txt", mode='rt', encoding=extended_ascii) as file:
# This should cause a UnicodeDecodeError if file contains
# invalid extended ASCII.
file.read()
A more accurate description of require-ascii would be:
require-ascii
What it does
Requires that text files use UTF-8 and only use code points ≤ 255.
According to the README:
require-ascii
will fail on files that are encoded in extended ASCII if:This script will generate a bunch of files that contain valid extended ASCII but fail when tested by
require-ascii
:A more accurate description of
require-ascii
would be: