mattharrison / IllustratedPy3

Notes and issues for Illustrated Guide to Python 3
13 stars 5 forks source link

20.1 Background #272

Closed mclaughlin closed 7 years ago

mclaughlin commented 7 years ago

All of these encoding schemes provided a one-to-one mapping of bytes to a character. In order to support Chinese, Korean, and Japanese scripts, many more than 128 symbols would be needed. Using four bytes would provide support for over 4 billion characters. But this encoding would come at a cost. For the majority of people using only ASCII centric characters, requiring them to be encoded in four times as much data seemed like a colossal waste of memory.

Mixed verb tense. Keep historical events in the past tense:

Add hyphen: "ASCII centric" --> "ASCII-centric"

As it's written, the antecedent of "them" are the "people," when you really meant the characters: "requiring them" --> "requiring those characters"

All of these encoding schemes provided a one-to-one mapping of bytes to a character. In order to support Chinese, Korean, and Japanese scripts, many more than 128 symbols were needed. Using four bytes provided support for over 4 billion characters. But this encoding came at a cost. For the majority of people using only ASCII-centric characters, requiring those characters to be encoded in four times as much data seemed like a colossal waste of memory.

A compromise that provided both the ability to support all characters but not waste memory, was to stop encoding characters to a sequence of bits. Rather, the characters would be abstracted. Each character would instead map to a unique code point (that has a hex value and a unique name). Various encodings would then map these code points to bit encodings.

More verb tense:

A compromise that provided both the ability to support all characters but not waste memory, was to stop encoding characters to a sequence of bits. Rather, the characters were abstracted. Each character instead mapped to a unique code point (that has a hex value and a unique name). Various encodings then mapped these code points to bit encodings.

For different contexts, an alternate encoding might provide better characteristics. Unicode is this mapping from character to a code point, it is not the encoding.

Unicode maps from character to a code point—it is not the encoding. For different contexts, an alternate encoding might provide better characteristics.

The notion of variable width encodings also came out to help alleviate memory waste.

"also came out to" --> "also helped"

The notion of variable width encodings also helped alleviate memory waste.

It can use between one and four bytes to represent a character.

"can use" --> "uses"

It uses between one and four bytes to represent a character.

In addition, UTF-8 has this nice feature that it is backward compatible with ASCII.

"has this nice feature that it is" --> "is"

In addition, UTF-8 is backward compatible with ASCII.

mattharrison commented 7 years ago

Oct 2