swiftlang / swift

The Swift Programming Language
https://swift.org
Apache License 2.0
67.18k stars 10.32k forks source link

[SR-6076] [String] `var count: String.CharacterView.IndexDistance { get }` returns a wrong value on Linux when "Regional Indicator Symbols" are contained. #48631

Closed YOCKOW closed 5 years ago

YOCKOW commented 6 years ago
Previous ID SR-6076
Radar None
Original Reporter @YOCKOW
Type Bug
Status Resolved
Resolution Done
Environment - Swift 4.0 - OS - macOS: High Sierra - Linux: Ubuntu 16.04
Additional Detail from JIRA | | | |------------------|-----------------| |Votes | 0 | |Component/s | Standard Library | |Labels | Bug | |Assignee | @milseman | |Priority | Medium | md5: 3e4be3a5ea4f496c18a2f04121ef3483

relates to:

Issue Description:

[Sample Code]

let jp: Character = "\u{1F1EF}\u{1F1F5}" // Flag of Japan
let de: Character = "\u{1F1E9}\u{1F1EA}" // Flag of Germany
print("\(jp)".count) // Prints "1", of course
print("\(de)".count) // Prints "1", of course
print("\(jp)\(de)".count) // Prints "2" on macOS, but prints "1" on Linux
print("\(jp)\(de)\(jp)\(de)".count) // Prints "4" on macOS, but prints "1" on Linux

[Note]

belkadan commented 6 years ago

This is a difference between Unicode 9 and Unicode 10 (or possibly Unicode 10 and Unicode 11, I'm not sure) and is dependent on the version of ICU used to build Swift. cc @airspeedswift

airspeedswift commented 6 years ago

I think the flags issue was resolved in Unicode 9. Unfortunately, AFAICT Ubuntu 16 is still on ICU 55 which is Unicode 7.

We're considering switching to bundling ICU with the toolchain in future releases which would allow Linux to have a modern ICU in a similar fashion to Darwin. Not sure if we have a JIRA for this already, if not we can probably repurpose this for that.

177d8476-2756-4152-91d7-984f74d3896c commented 6 years ago

This is due to the user having an old version of ICU (such as that shipped on Ubuntu LTS). As Ben mentioned, we're hoping to ship a modern ICU similarly to Darwin for Linux. That will also carry with it performance improvements, and greater behavior parity and build system simplification.

Should I put this as a dup on that task? Is there a JIRA for that task?

YOCKOW commented 6 years ago

I'm very sorry but I've bothered you all.
Now I understand that this is not a bug of Swift, but a "feature" of Unicode (which depends on the version of it).

177d8476-2756-4152-91d7-984f74d3896c commented 6 years ago

It’s no bother at all! It’s useful to know people hit this and that unifying ICU versions would help our users.

spevans commented 6 years ago

FYI, I built a version of Swift with Darwin's ICU from https://opensource.apple.com/tarballs/ICU/ICU-59152.0.1.tar.gz on Ubuntu 16.04 and it fixes this issue and also fixes SR-5591. Im happy to sort out a patch to build swift with this version but would obviously need someone to add Apple's ICU to a repository on github if you think this is worth pursuing.

177d8476-2756-4152-91d7-984f74d3896c commented 6 years ago

I think that's totally worth pursuing.

@airspeedswift how can we proceed?

allevato commented 6 years ago

The pull request implementing SE-0211 recently hit a similar issue, because Ubuntu 16.04 comes with a version of ICU too old to support the emoji properties that we exposed. I linked to this bug in the FIXME comment (we're just conditionally hiding the declarations on non-Darwin for the time being). The treatment of grapheme clusters above is reason enough, but if we're also going to be shipping APIs like Unicode.Scalar.Properties that wrap modern ICU calls, we really need to be shipping a consistent known version.

spevans commented 5 years ago

Linux Swift now ships with ICU 61.1 due to SR-8876 so this is now fixed in swift-DEVELOPMENT-SNAPSHOT-2018-11-13

$ cat sr_6076.swift 
let jp: Character = "\u{1F1EF}\u{1F1F5}" // Flag of Japan
let de: Character = "\u{1F1E9}\u{1F1EA}" // Flag of Germany
print("\(jp)".count) // Prints "1", of course
print("\(de)".count) // Prints "1", of course
print("\(jp)\(de)".count) // Prints "2" on macOS, but prints "1" on Linux
print("\(jp)\(de)\(jp)\(de)".count) // Prints "4" on macOS, but prints "1" on Linux

$ ~/swift-DEVELOPMENT-SNAPSHOT-2018-11-13-a-ubuntu14.04/usr/bin/swift sr_6076.swift 
1
1
2
4