swiftlang / swift

The Swift Programming Language
https://swift.org
Apache License 2.0
67.29k stars 10.33k forks source link

[SR-6544] Runtime crash on integer overflow while grapheme breaking huge Strings #49094

Closed lorentey closed 6 years ago

lorentey commented 6 years ago
Previous ID SR-6544
Radar rdar://problem/35881735
Original Reporter @lorentey
Type Bug
Status Closed
Resolution Done
Environment Apple Swift version 4.0.1 (swiftlang-902.0.18 clang-902.0.15.1)
Additional Detail from JIRA | | | |------------------|-----------------| |Votes | 0 | |Component/s | Standard Library | |Labels | Bug | |Assignee | @lorentey | |Priority | Medium | md5: 9399e318128c81acd7eaf2b5e849882b

Issue Description:

The following little program creates a huge string and tries to print its first extended grapheme cluster.

var s = "o\u{030B}"
for _ in 1 ... 32 {
  s = s + s
} 
print(s[s.startIndex])

I expect the program to print "ő"; unfortunately, instead it crashes on an integer overflow:

$ swift ~/Swift/hugestring.swift
Fatal error: Not enough bits to represent a signed value
0  swift                    0x000000010494e7da PrintStackTraceSignalHandler(void*) + 42
1  swift                    0x000000010494dc96 SignalHandler(int) + 598
2  libsystem_platform.dylib 0x00007fff796fcf5a _sigtramp + 26
3  libsystem_platform.dylib 000000000000000000 _sigtramp + 2257596608
4  libswiftCore.dylib       0x0000000107e9d6a2 _T0SS13CharacterViewV42_measureExtendedGraphemeClusterForwardSlowS2i14relativeOffset_SS5IndexV5startAF3endSi0lK5UTF16tFTfq4nxxnn_n + 402
5  libswiftCore.dylib       0x0000000107ef718e _T0SS9subscripts9CharacterVSS5IndexVcfgTfq4xx_n + 734
6  libswiftCore.dylib       0x0000000107dbdaf9 _T0SS9subscripts9CharacterVSS5IndexVcfg + 9
7  libswiftCore.dylib       0x0000000107b552e2 _T0SS9subscripts9CharacterVSS5IndexVcfg + 4292442098
8  swift                    0x0000000101d3eb0b llvm::MCJIT::runFunction(llvm::Function*, llvm::ArrayRef<llvm::GenericValue>) + 635
9  swift                    0x0000000101d45024 llvm::ExecutionEngine::runFunctionAsMain(llvm::Function*, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, char const* const*) + 708
10 swift                    0x00000001010fa0b1 performCompile(swift::CompilerInstance&, swift::CompilerInvocation&, llvm::ArrayRef<char const*>, int&, swift::FrontendObserver*, swift::UnifiedStatsReporter*) + 22353
11 swift                    0x00000001010f30d9 swift::performFrontend(llvm::ArrayRef<char const*>, char const*, void*, swift::FrontendObserver*) + 8697
12 swift                    0x00000001010a419e main + 13918
13 libdyld.dylib            0x00007fff7947b115 start + 1
Stack dump:
0. Program arguments: /Applications/Edge/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/swift -frontend -interpret /Users/lorentey/Swift/hugestring.swift -enable-objc-interop -sdk /Applications/Edge/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.13.sdk -color-diagnostics -module-name hugestring 
Illegal instruction: 4

Grapheme breaking is delegated to ICU in this case, and ICU's break iterators use 32-bit offsets.

lorentey commented 6 years ago

@swift-ci create

177d8476-2756-4152-91d7-984f74d3896c commented 6 years ago

Regardless of whether we support more than 2^32-length graphemes we should give a better trap message to users

lorentey commented 6 years ago

The grapheme clusters here are quite boring – they're just two scalars wide; however, we used to pass the entire string to ICU, and that failed if the string itself is longer than 2^32 code units.

We fixed this as part of the String overhaul, which has long since landed on master and is included in 4.2. (We now pass a slice of the string that's no more than what ICU can process. We still won't process ridiculously long grapheme clusters correctly, but at least we won't crash.)