20.1 Background - Githubissues

All of these encoding schemes provided a one-to-one mapping of bytes to a character. In order to support Chinese, Korean, and Japanese scripts, many more than 128 symbols would be needed. Using four bytes would provide support for over 4 billion characters. But this encoding would come at a cost. For the majority of people using only ASCII centric characters, requiring them to be encoded in four times as much data seemed like a colossal waste of memory.

Mixed verb tense. Keep historical events in the past tense:

"would be needed" --> "were needed"
"would provide" --> "provided"
"would come at a cost" -- > "came at a cost"

Add hyphen: "ASCII centric" --> "ASCII-centric"

As it's written, the antecedent of "them" are the "people," when you really meant the characters: "requiring them" --> "requiring those characters"

All of these encoding schemes provided a one-to-one mapping of bytes to a character. In order to support Chinese, Korean, and Japanese scripts, many more than 128 symbols were needed. Using four bytes provided support for over 4 billion characters. But this encoding came at a cost. For the majority of people using only ASCII-centric characters, requiring those characters to be encoded in four times as much data seemed like a colossal waste of memory.

A compromise that provided both the ability to support all characters but not waste memory, was to stop encoding characters to a sequence of bits. Rather, the characters would be abstracted. Each character would instead map to a unique code point (that has a hex value and a unique name). Various encodings would then map these code points to bit encodings.

More verb tense:

"Rather, the characters would be abstracted" --> "Rather, the characters were abstracted."
"Each character would instead map" --> "Each character instead mapped"
"encodings would then map these" --> "encodings then mapped these"

A compromise that provided both the ability to support all characters but not waste memory, was to stop encoding characters to a sequence of bits. Rather, the characters were abstracted. Each character instead mapped to a unique code point (that has a hex value and a unique name). Various encodings then mapped these code points to bit encodings.

For different contexts, an alternate encoding might provide better characteristics. Unicode is this mapping from character to a code point, it is not the encoding.

"is this mapping" --> "maps"
These two sentences seem out of order. I recommend swapping them.
Add em dash: "it is not the encoding" --> "—it is not the encoding

Unicode maps from character to a code point—it is not the encoding. For different contexts, an alternate encoding might provide better characteristics.

The notion of variable width encodings also came out to help alleviate memory waste.

"also came out to" --> "also helped"

The notion of variable width encodings also helped alleviate memory waste.

It can use between one and four bytes to represent a character.

"can use" --> "uses"

It uses between one and four bytes to represent a character.

In addition, UTF-8 has this nice feature that it is backward compatible with ASCII.

"has this nice feature that it is" --> "is"

In addition, UTF-8 is backward compatible with ASCII.

mattharrison / IllustratedPy3

20.1 Background #272