swiftlang / swift

The Swift Programming Language
https://swift.org
Apache License 2.0
67.18k stars 10.32k forks source link

[SR-5448] Creating a Character via ExpressibleByExtendedGraphemeClusterLiteral fails with complex emoji #48020

Closed ole closed 7 years ago

ole commented 7 years ago
Previous ID SR-5448
Radar None
Original Reporter @ole
Type Bug
Status Resolved
Resolution Duplicate
Environment Apple Swift version 4.0 (swiftlang-900.0.49.1 clang-900.0.29) (Xcode 9 beta 3).
Additional Detail from JIRA | | | |------------------|-----------------| |Votes | 0 | |Component/s | Standard Library | |Labels | Bug | |Assignee | None | |Priority | Medium | md5: bc2370cb250cd6672de7818c125debc8

duplicates:

Issue Description:

Tested on Xcode 9 beta 3.

Creating a Character from a grapheme cluster literal fails when the literal contains a complex emoji, e.g. one with a skin tone. I tested this with:

"\u{1F64D}\u{1F3FD}"

(WOMAN + skin tone modifier). Creating the value with:

Character("\u{1F64D}\u{1F3FD}")

works fine.

> xcrun swift
Welcome to Apple Swift version 4.0 (swiftlang-900.0.49.1 clang-900.0.29). Type :help for assistance.
  1> let c: Character = "\u{1F64D}" // woman
c: Character = {
  _representation = smallUTF16 {
    smallUTF16 = 3729643581
  }
}
  2> let d: Character = "\u{1F64D}\u{1F3FD}" // woman + skin tone
error: repl.swift:2:20: error: cannot convert value of type 'String' to specified type 'Character'
let d: Character = "\u{1F64D}\u{1F3FD}" // woman + skin tone
                   ^~~~~~~~~~~~~~~~~~~~

  2> let e = Character("\u{1F64D}\u{1F3FD}") // woman + skin tone
e: Character = {
  _representation = large {
    large = {
      _nativeBuffer = 0x8000000100205330
    }
  }
}

It's not just skin tones. The same error occurs when the emoji is a profession without a skin tone, e.g.:

let f: Character = "\u{1F468}\u{200D}\u{2708}\u{FE0F}"

(MALE PILOT) also fails with the same error.

ole commented 7 years ago

@airspeedswift This one is probably for you?

airspeedswift commented 7 years ago

cc @milseman

Yeah the compiler is still using the old grapheme breaking implementation. Workaround is to write

"\u{1F64D}\u{1F3FD}".first!

There's no good answer here. The compiler could use the same ICU-based grapheme breaking, but then code could fail to compile on machines with an older copy of the library that has different breaking. We could trap at runtime instead for 2+ character literals, or just silently ignore multi-character characters (either dropping the excess parts or letting characters essentially be strings). Probably best option is to warn instead of error in this case.

ole commented 7 years ago

Oh yes, I hadn't considered the split between compiler and ICU. That makes sense. Thanks for the explanation.

177d8476-2756-4152-91d7-984f74d3896c commented 7 years ago

Yes, this is the same as https://bugs.swift.org/browse/SR-4546