swiftlang / swift

The Swift Programming Language
https://swift.org
Apache License 2.0
67.5k stars 10.35k forks source link

[SR-660] Unicode identifier mangling is invalid when first ASCII character in identifier is digit #43275

Closed swift-ci closed 7 years ago

swift-ci commented 8 years ago
Previous ID SR-660
Radar rdar://problem/25821287
Original Reporter donald-pinckney (JIRA User)
Type Bug
Status Closed
Resolution Done

Attachment: Download

Environment Running El Capitan, with swiftc -v giving: Apple Swift version 2.2-dev (LLVM f90171f6b9, Clang fe39b0b18f, Swift a476c2828a) Target: x86_64-apple-macosx10.9 This should be January 25 development snapshot.
Additional Detail from JIRA | | | |------------------|-----------------| |Votes | 0 | |Component/s | Compiler | |Labels | Bug, AffectsABI, Runtime | |Assignee | @jckarter | |Priority | Medium | md5: 3e8195014e753da557e682db8672516f

Issue Description:

When naming a struct with a unicode math symbol (or at least one such symbol), the compiled code will crash when putting struct instances into an array. See attached code for an example.

swiftc compiles the attached code, and produces no warning or errors. Executing the binary gives:

Segmentation fault: 11

Running the attached code directly with swift gives:

0  swift                    0x0000000102f5b90b llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 43
1  swift                    0x0000000102f5abc6 llvm::sys::RunSignalHandlers() + 70
2  swift                    0x0000000102f5bfb2 SignalHandler(int) + 322
3  libsystem_platform.dylib 0x00007fff8fe8552a _sigtramp + 26
4  libsystem_platform.dylib 0x00007fff5f3ed9c8 _sigtramp + 3478553784
5  libswiftCore.dylib       0x000000010582029a (anonymous namespace)::Remangler::mangle(swift::Demangle::Node*) + 7258
6  libswiftCore.dylib       0x000000010581e9fa (anonymous namespace)::Remangler::mangle(swift::Demangle::Node*) + 954
7  libswiftCore.dylib       0x000000010581f471 (anonymous namespace)::Remangler::mangle(swift::Demangle::Node*) + 3633
8  libswiftCore.dylib       0x000000010581e568 swift::Demangle::mangleNode(std::__1::shared_ptr<swift::Demangle::Node> const&) + 168
9  libswiftCore.dylib       0x0000000105813b4a _swift_initializeSuperclass(swift::ClassMetadata*, bool) + 570
10 libswiftCore.dylib       0x0000000105814c82 swift_initializeSuperclass + 18
11 libswiftCore.dylib       0x00000001057ba03b _TMaCs24_ContiguousArrayStorage1 + 267
12 libswiftCore.dylib       0x0000000105815198 (anonymous namespace)::GenericCacheEntry* llvm::function_ref<(anonymous namespace)::GenericCacheEntry* ()>::callback_fn<swift::swift_getGenericMetadata::$_1>(long) + 24
13 libswiftCore.dylib       0x0000000105815040 swift::MetadataCache<(anonymous namespace)::GenericCacheEntry>::addMetadataEntry(swift::EntryRef<(anonymous namespace)::GenericCacheEntry>, ConcurrentList<swift::MetadataCache<(anonymous namespace)::GenericCacheEntry>::EntryPair>&, llvm::function_ref<(anonymous namespace)::GenericCacheEntry* ()>) + 128
14 libswiftCore.dylib       0x0000000105811155 swift_getGenericMetadata1 + 85
15 libswiftCore.dylib       0x0000000105616c04 _TFs27_allocateUninitializedArrayurFBwTGSax_Bp_ + 36
16 libswiftCore.dylib       0x00000001042f90ee _TFs27_allocateUninitializedArrayurFBwTGSax_Bp_ + 4274922766
17 swift                    0x0000000100e4ac31 llvm::MCJIT::runFunction(llvm::Function*, llvm::ArrayRef<llvm::GenericValue>) + 1345
18 swift                    0x0000000100e4df9f llvm::ExecutionEngine::runFunctionAsMain(llvm::Function*, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, char const* const*) + 1231
19 swift                    0x0000000100d18380 swift::RunImmediately(swift::CompilerInstance&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, swift::IRGenOptions&, swift::SILOptions const&) + 2752
20 swift                    0x000000010081a6be frontend_main(llvm::ArrayRef<char const*>, char const*, void*) + 11550
21 swift                    0x0000000100813a49 main + 2905
22 libdyld.dylib            0x00007fff831985ad start + 1
23 libdyld.dylib            0x000000000000000c start + 2095479392
Stack dump:
0.  Program arguments: /Library/Developer/Toolchains/swift-2.2-SNAPSHOT-2016-01-25-a.xctoolchain/usr/bin/swift -frontend -interpret main.swift -target x86_64-apple-macosx10.9 -enable-objc-interop -sdk /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk -color-diagnostics -module-name main 
Segmentation fault: 11

Note that the character used is:
MATHEMATICAL DOUBLE-STRUCK CAPITAL F
Unicode: U+1D53D, UTF-8: F0 9D 94 BD

Presumably the compiler should either emit an error if it is an unacceptable character, or generate not broken code.

belkadan commented 8 years ago

It's crashing in the remangler? Huh. @jckarter, something wrong with our Punycode implementation?

jckarter commented 8 years ago

It looks like we fail to demangle the name somewhere too, since the first print doesn't show the correct type name either:

V4mainX63_IFJq()
jckarter commented 8 years ago

Looks like we might be mangling incorrectly in the compiler, that should be X6C_IFJq, not X63_IFJq. The latter parses as a 63-byte identifier mangling, which is probably what's causing the crash.

jckarter commented 8 years ago

Ah, the problem is that the first ASCII character in the identifier is a digit, so the ASCII prefix of the Punycode encoding butts up against the identifier length and produces the invalid mangling.

jckarter commented 8 years ago

We customize the encoding so that the suffix alphabet doesn't contain digits, so we could flip the encoding so that the non-ASCII suffix goes first.

jckarter commented 8 years ago

…or just give up on Punycode and encode UTF-8 directly; ELF and Mach-O both allow symbols to be arbitrary byte strings (though tools may or may not be UTF-8-clean).

belkadan commented 8 years ago

I'd still rather not. It sounds like the encoding change would be trivial (and presumably unambiguous).

eeckstein commented 7 years ago

This is already fixed