disunification and naming suggestions from Karl Pentzlin

google / emoji4unicode

Automatically exported from code.google.com/p/emoji4unicode

Apache License 2.0

51 stars 20 forks source link

Quick comments on some Emoji symbols
 - Karl Pentzlin 2009-01-06

Reference:
http://www.unicode.org/~scherer/emoji4unicode/snapshot/full.html
as of 2009-01-06

The base for some of the comments are:
- Symbols which are not merely glyph variants of each other should not
 be unified; if someone can address different semantics to two
 symbols they are different symbols, even when they are used
 interchangeably in the Japanese Telco context. When encoded in
 Unicode, the context is no more limited such.
- Symbols should be named as they appear as emoji, not according
 to the black-and-white fallback glyph which is associated to
 them to print the Unicode charts. This means:
 · Symbols with an inherent color shall bear this color in their
   name unless the entity denoted by the name has identifies the color
   anyway (e.g., a BANANA is uniquely yellow and therefore does
   not need to be called YELLOW BANANA, while a RED APPLE must be
   named so as there are also green apples).
 · Symbols which semantics include animation shall have ANIMATED
   as part of its name (this does not apply to symbols where
   animation is a feature of glyph variance only).

All symbol names are relative to any generic prefixes which are applied
to the set of emoji symbols  or subsets of it during the ongoing
discussion.

Any comment starting with "KDDI is", "DoCoMo is", or "SoftBank is"
is a request to not unify this with the other symbols of the same row.

e-004 KDDI is THUNDERSTORM WITH RAIN
e-008 SoftBank is NIGHT WITH FALLING STAR
e-014 should be named otherwise e.g. MOON-LIKE CRESCENT,
     as a crescent moon must have its tips strictly opposite on
     the enclosing circle. Naming this CRESCENT MOON is an offence
     to anybody who knows the astronomical mechanisms.
e-02b...e-037: General comments sent by a previous mail.
e-036 The KDDI symbol shows one fish, while PISCES is plural.
     Therefore, to complete the pictorial Zodiac set a picture
     of two fishes is needed, while the KDDI symbol is "fish".
e-038 SoftBank is TSUNAMI (??)
e-03A should be named ERUPTING VOLCANO (in contrast to the Mount
     Fuji symbol which may be required to be named VOLCANO to
     avoid geographical preferences).
e-040 DoCoMo and SoftBank are PINK CHERRY BLOSSOM or JAPANESE CHERRY BLOSSOM
     (Some European cherry trees blossom in white)
     KDDI is PINK BLOSSOMING CHERRY TREE or JAPANESE BLOSSOMING CHERRY
     TREE
e-044 just not to be listed under "nature", the symbol seems unequivocally
     to be the newly licensed driver plate.
     The name JAPANESE NEW LICENSED DRIVER SIGN seems preferable.
e-051 is RED APPLE
e-057 is WATER MELON - most melons sold in Europe are yellow and oval
e-05B is GREEN APPLE
e-190 is COMIC EYES or EYEBALLS
e-193 seems to be RED LIPS rather than generic MOUTH
e-197 is ANIMATED FACE MESSAGE
e-198 SoftBank is HAIRCUT (??)
e-19F is MAN WOMAN PAIR
e-1A1 is POLICEMANS HEAD WITH FLAT CAP
     (in other countries, police caps may look definitively different)
     if there is a police cap by SoftBank, this is a different FLAT POLICE CAP
e-1A2 KDDI is WOMANS HEAD WITH BUNNY EARS
     SoftBank is TWO DANCING WOMEN WITH BUNNY EARS
e-1A3 is BRIDES HEAD WITH VEIL
e-1A4 at first glance: KDDI is BLOND WOMAN, SoftBank is BLOND MAN
     It seems appropriate to recategorize:
     e-19D            DARK-HAIRED MANS HEAD
     e-19E            DARK-HAIRED WOMANS HEAD
     e-1A4 (KDDI)     BRIGHT-HAIRED WOMANS HEAD
     e-1A4 (SoftBank) BRIGHT-HAIRED MANS HEAD
     e-19B            DARK-HAIRED BOYS HEAD
     e-19B (variant)  BRIGHT-HAIRED BOYS HEAD
     e-19C (SoftBank) DARK-HAIRED GIRLS HEAD
     e-19C (KDDI)     BRIGHT-HAIRED GIRLS HEAD
e-1A5 is MANS HEAD WITH LONG MOUSTACHE
     For reasons of political correctness, there must be two characters:
     DARK-HAIRED MANS HEAD WITH LONG MOUSTACHE
     BRIGHT-HAIRED MANS HEAD WITH LONG MOUSTACHE
     Otherwise, some traditional Bavarians which use to wear long blonde
     moustaches may be offended.
e-1A6 is MANS HEAD WITH TURBAN
     *** It is *STRONGLY* objected to show this icon with another skin color
         than the others
         ***
     Alternatively, it has to be scrutinized whether ALL person and head
     symbols have to be differentiated by BRIGHT SKINNED, BROWN SKINNED
     and DARK SKINNED versions in a politically correct way which is acceptable
     to all people in the world.
e-1A7 is OLDER MANS HEAD
e-1A8 is OLDER WOMANS HEAD
e-1A9 is BABYS HEAD
e-1AA is CONSTRUCTION WORKERS HEAD WITH HELMET
e-1AB is YOUNG BRIGHT-HAIRED PRINCESS HEAD or BRIGHT-HAIRED GIRLS HEAD WITH 
CROWN
e-1AC is RED FACED OGRES HEAD
e-1AD is LONG-NOSED GOBLINS HEAD
e-1AF is PUTTO ANGEL (simply ANGEL may be offensive to some religious people)
e-1B0 KDDI is ALIEN SPACESHIP, SoftBank is BIG-EYED ALIEN FACE
e-1B2 is FACE WITH DEVILS HORNS (simply DEVIL may be offensive to some
     religious or superstitious people)
e-1B6 KDDI is ANIMATED MALE DISCO DANCER,
     SoftBank is ANIMATED FEMALE FLAMENCO DANCER
e-1B7 is DOG FACE, SoftBank is PUPPY FACE, similarly
     e-1B8,1BF,1C0,1C1,1C2,1CA,1D1,1D2,1D7,154 add " FACE" like it is done for 
e-1C4
     MONKEY FACE
e-1BD see comment for e-036
e-1C8 is SITTING WHITE BIRD
e-1D0 is FOX HEAD
e-353 is ANIMATED BOWING FACE
e-357 is ANIMATED PERSON RAISING ONE HAND, SoftBank is PALM OF HAND
e-358 is ANIMATED PERSON RAISING BOTH HANDS, SoftBank is
     ANIMATED PAIR OF HANDS OPENING AND CLOSING
e-359 is ANIMATED PERSON FROWNING
e-35A is ANIMATED PERSON MAKING POUTING FACE
e-35B SoftBank is PAIR OF RAISED FOLDED HANDS
e-4B0 is SMALL HOUSE
e-4B4 is HOSPITAL DENOTED BY CROSS SYMBOL
e-4B5 is ALPHABETIC BANK SYMBOL
e-4b6 is AUTOMATIC TELLER MACHINE SYMBOL
e-4b7 DoCoMo is LATIN LETTER H ENCLOSED IN A HOUSE SYMBOL
e-4C2 is RED LANTERN DENOTING JAPANESE IZAKAYA RESTAURANT
     ("red lantern" is a symbol for two totally different concepts in European
      culture: a. brothel, b. being the last one in a sports competition)
e-4CA is WORKERS MALLET (as it looks different from the common household 
hammer)
e-4CC is MENS LOW SHOE
e-4D2 is TRIDENT (the listing under Clothing/Wearables is wrong)
e-4D5 is LADIES FORMAL DRESS
e-4D7 is MULE SHOE or similar, to denote it is not the animal called mule
e-4DD should be encoded as an enclosing combining mark MONEY BAG, which can
     be applied to any currency symbol
e-4DE is DOLLAR YEN CURRENCY EXCHANGE
e-4DF is CHART WITH RISING CURVE AND YEN SYMBOL
e-4EF is SINGLE-LENS REFLEX STILL PICTURE CAMERA
e-4F4 is FAECES or PICTORIAL EXPRESSION OF DISDAIN
e-4F7 is CRYSTAL BALL ON RACK
e-4Fa is MEAT CLEAVER
e-4FB is TORCH
e-4FD this is a nonstandard symbol for window scrolling and must be named in
     a way that it is not mistaken for any ISO 7000 or similar symbol;
     thus it must get a prefix like JAPANESE TELCO SYMBOL if it gets
     no generic name prefix for the emoji set or a subset
e-4FE is ELECTRIC PLUG WITH CABLE
e-4FF is GREEN CLOSED BOOK LYING WITH BACK TO THE RIGHT
     in this way applicable to books to be read from right to left
e-500 is BLUE CLOSED BOOK LYING WITH BACK TO THE RIGHT
e-501 is ORANGE CLOSED BOOK LYING WITH BACK TO THE RIGHT
e-502 is FRONT OF GREEN BOOK WITH LABEL or FRONT OF GREEN NOTEBOOK WITH LABEL
e-503 is STACK OF BOOKS LYING WITH BACK TO THE LEFT
e-505 KDDI is WOMANS HEAD WITH BATHING CAP, SoftBank is PERSON TAKING A BATH
e-506 is LADIES AND GENTS RESTROOMS SIGN
e-509 is SYRINGE WITH DROP OF BLOOD
e-50B/C/D/E: depending on the way coloring of those emojis which are unique
     when disregarding color are treated eventually: If the black-and-white
     equivalents are to be encoded:, these are:
     KDDI: new      U+1F130 SQUARED LATIN LETTER A
           existing U+1F131 SQUARED LATIN LETTER B
           new      U+1F1xx SQUARED DIGIT 0
           new      U+1F1xx SQUARED AB
     SoftBank:  new U+1F150 WHITE ON BLACK CIRCLED LATIN CAPITAL LETTER A
                new U+1F151 WHITE ON BLACK CIRCLED LATIN CAPITAL LETTER B
                new U+1F1xx WHITE ON BLACK CIRCLED DIGIT 0
                    (probably to be unified with U+24FF NEGATIVE CIRCLED DIGIT 
ZERO)
                new U+1F1xx WHITE ON BLACK CIRCLED AB
e+513 is SANTA CLAUS FACE
e+515 is ANIMATED NIGHT SKY WITH FIREWORKS
e+517 is ANIMATED PARTY POPPER
e+51D is ANIMATED NIGHT SKY WITH JAPANESE SPARKLER
e+520 is ANIMATED OPENING CONFETTI BALL
----------- Comments for emoji symbols starting from e+522 may follow later.

Original issue reported on code.google.com by markus.icu on 6 Jan 2009 at 8:25

My initial reply on the emoji4unicode list: On colors: We considered symbol colors for disunification but rarely for character names. Instead, with UTC guidance, we unified a number of symbols with existing characters which have black/white/striped... glyphs and names. For newly proposed symbols, we followed the precedent and chose similar character names, matching the glyphs in the font that is being worked on. On disunifications: At a glance, it looks like many of the suggested disunifications assume more specific and precise meanings and shapes than are intended by the cell phone carriers. For example, - If a symbol generally looks like a crescent moon (e-014) and is described or named by the carriers to represent one, it makes little sense to give it a different meaning based on an imprecise symbol shape. (What we can do is design a better glyph.) - If a carrier clearly intends a certain meaning, and shows that in name, shape, context of surrounding symbols and maybe other available information, we should follow that meaning and not artificially invent a separate symbol and meaning. (e-036 pisces vs. KDDI single fish) - The carriers' understanding of "glyph variants", as expressed in symbol names and cross-mapping tables, is clearly broader than your sense of "glyph variants". For interoperability, we usually try to follow the carriers' cross-mappings, except when they are way off (as in e-7E0 subway vs. e-7E1 metro sign, which has been discussed by the UTC before).

Ken Whistler's reply on the emoji4unicode list: > Quick comments on some Emoji symbols > - Karl Pentzlin 2009-01-06 > > Reference: > http://www.unicode.org/~scherer/emoji4unicode/snapshot/full.html > as of 2009-01-06 > > The base for some of the comments are: > - Symbols which are not merely glyph variants of each other should not > be unified; if someone can address different semantics to two > symbols they are different symbols, even when they are used > interchangeably in the Japanese Telco context. When encoded in > Unicode, the context is no more limited such. I disagree. It is true that encoding a character for a symbol in Unicode puts it in a context where it might not always be limited to transcoding for the Japanese wireless sets, so that due consideration must be given to how this is done. However, when what we are encoding is a compability character for an emoji which is *already* unified by de facto mappings between the various carrier sets, it is not helpful -- in fact is disruptive -- to disunify glyph variants simply because the telcos use different glyphs to display the cross-mapped character in question. In such cases, as for the zodiac symbols which you wrote a separate note on (and which Markus responded to), the correct encoding solution here is to treat the cross-mapped emoji as a *single* character for encoding, and then to either encode a new single Unicode character (if no existing Unicode character is appropriate) or to map to a single Unicode character if one already exists -- as for the zodiac signs. If a separate need occurs in the future to distinguish animal-pictorial representations of zodiac signs, for example, from traditional astrological symbolic representations of zodiac signs, that needs to be done in a separate context and be separately argued from the current emoji set -- because separately encoding them on the basis merely of the distinct glyphs used by the wireless carriers would *not* be a helpful or useful solution to the emoji cross-mapping to Unicode problem. > - Symbols should be named as they appear as emoji, not according > to the black-and-white fallback glyph which is associated to > them to print the Unicode charts. This means: > · Symbols with an inherent color shall bear this color in their > name unless the entity denoted by the name has identifies the color > anyway (e.g., a BANANA is uniquely yellow and therefore does > not need to be called YELLOW BANANA, while a RED APPLE must be > named so as there are also green apples). I disagree. This principle is simply not helpful. It perpetuates the notion that colors are *inherently* a part of the character identity here. And that does not serve the purpose of providing a cross-mapping set for interoperability with the emoji characters. It would be far, far better to simply have some abstracted compability characters identified as EMOJI SYMBOL FOR BOOK-1, EMOJI SYMBOL FOR BOOK-2, EMOJI SYMBOL FOR BOOK-3, etc., rather than to insist on encoding RED BOOK SYMBOL, BLUE BOOK SYMBOL, ORANGE BOOK SYMBOL, and then jump off the deep end in insisting that the associated glyphs actually need to support color distinctions. > · Symbols which semantics include animation shall have ANIMATED > as part of its name (this does not apply to symbols where > animation is a feature of glyph variance only). I disagree. This is the same issue as for the colored glyphs, only more so. It is simply not helpful to insist that "ANIMATED" be part of the character name, when that is a description of the animated glyphs used on phones, rather than a useful identifying label for the *character* we are going to encode to represent the symbol in question. > All symbol names are relative to any generic prefixes which are applied > to the set of emoji symbols or subsets of it during the ongoing > discussion. > > Any comment starting with "KDDI is", "DoCoMo is", or "SoftBank is" > is a request to not unify this with the other symbols of the same row. And I will simply put my comment in as opposing *all* such disunifications across the board, without objecting to each individual suggestion one-by-one below. I think this whole approach is a very deep semiotic trap that completely misconstrues both the problem and the nature of the solution required for cross-mapping the emoji sets in Unicode. > e-1A5 is MANS HEAD WITH LONG MOUSTACHE > For reasons of political correctness, there must be two characters: > DARK-HAIRED MANS HEAD WITH LONG MOUSTACHE > BRIGHT-HAIRED MANS HEAD WITH LONG MOUSTACHE > Otherwise, some traditional Bavarians which use to wear long blonde > moustaches may be offended. This is an example of the kind of dead end that this approach results in. The problem here is to create a standard mapping code point in Unicode for the emoji symbol listed at e-1A5. The problem is *not* to solve some generic issue of how to represent all races, skin colors, and masculine facial hair styles politically correctly via character codes. > e-1B6 KDDI is ANIMATED MALE DISCO DANCER, > SoftBank is ANIMATED FEMALE FLAMENCO DANCER That is another example of a completely unhelpful disunification, as well as an example of the inappropriate application of "ANIMATED" to a character name. The symbolic concept being represented here is of a dancer. The glyphs chosen on the phones to display that concept are animated and designed differently. But encoding distinct characters and making them overly specific to glyph designs is simply not a useful direction to take for the character encoding for the purpose intended here. I could make similar comments one-by-one, but it should be clear that I object to the complete set of comments in principle, rather than just here and there on its details.

On reviewing these suggestions more closely, I agree with Ken that the suggestions are based on an overly pedantic interpretation of the carrier images, disregarding both common practice of naming Unicode symbols as well as the carriers' cross- mappings.

google / emoji4unicode

disunification and naming suggestions from Karl Pentzlin #64