swiftlang / swift

The Swift Programming Language
https://swift.org
Apache License 2.0
67.31k stars 10.34k forks source link

[SR-3861] crash when trying to form a character range #46446

Open swift-ci opened 7 years ago

swift-ci commented 7 years ago
Previous ID SR-3861
Radar None
Original Reporter B98 (JIRA User)
Type Bug

Attachment: Download

Environment $ swiftc --version Swift version 3.0.2 (swift-3.0.2-RELEASE) Target: x86_64-unknown-linux-gnu
Additional Detail from JIRA | | | |------------------|-----------------| |Votes | 0 | |Component/s | Standard Library | |Labels | Bug, Linux, RunTimeCrash | |Assignee | None | |Priority | Medium | md5: e30dce6adc89d3ac2cc2aee6d576fed5

Issue Description:

While trying to understand the cause of a stack trace and termination saying, "Illegal instruction", I narrowed something down to this in the REPL:

Welcome to Swift version 3.0.2 (swift-3.0.2-RELEASE). Type :help for assistance.
  1> print("\u{DF}")
ß
  2> print("\u{DF}" ... "\u{DF}")
ß...ß
  3> print("\u{E0}") 
à
  4> print("\u{E0}" ... "\u{E0}")
à...à
  5> print("\u{DF}" ... "\u{E0}")
fatal error: Can't form Range with upperBound < lowerBound
Current stack trace:
0    libswiftCore.so                    0x00007ffff7d14c90 swift_reportError + 117
1    libswiftCore.so                    0x00007ffff7d261d0 _swift_stdlib_reportFatalError + 61
2    libswiftCore.so                    0x00007ffff7b33763 <unavailable> + 0
3    libswiftCore.so                    0x00007ffff7c90ebd <unavailable> + 0
4    libswiftCore.so                    0x00007ffff7b33763 <unavailable> + 0
5    libswiftCore.so                    0x00007ffff7c52770 specialized _fatalErrorMessage(StaticString, StaticString, StaticString, UInt, flags : UInt32) -> Never + 96
Execution interrupted. Enter code to recover and continue.
Enter LLDB commands to investigate (type :help for assistance.)
  6> print("\u{E0}" ... "\u{DF}")
à...ß

I'm probably confused, I was thinking that 0xDF is before 0xE0, and therefore the above ranges should, respectively, work and fail to be accepted, but not conversely? The original issue is in a case statement as seen in the file attached, which when compiled also exhibits the behavior.

More candidates:

"\u{2C1}" ... "\u{2C2}", "\u{2D1}" ... "\u{2D2}", "\u{2E4}" ... "\u{2E5}", "\u{FDF8}" ... "\u{FDF9}", "\u{FDFA}" ... "\u{FDFB}",...
swift-ci commented 7 years ago

Comment by Georg Bauhaus (JIRA)

Another one:

  3> print("\u{2C1}" ... "\u{2C2}") 
fatal error: Can't form Range with upperBound < lowerBound
Current stack trace:
0    libswiftCore.so                    0x00007ffff7d14c90 swift_reportError + 117
1    libswiftCore.so                    0x00007ffff7d261d0 _swift_stdlib_reportFatalError + 61
2    libswiftCore.so                    0x00007ffff7b33763 <unavailable> + 0
3    libswiftCore.so                    0x00007ffff7c90ebd <unavailable> + 0
4    libswiftCore.so                    0x00007ffff7b33763 <unavailable> + 0
5    libswiftCore.so                    0x00007ffff7c52770 specialized _fatalErrorMessage(StaticString, StaticString, StaticString, UInt, flags : UInt32) -> Never + 96
Execution interrupted. Enter code to recover and continue.
belkadan commented 7 years ago

These are ranges of Strings, not Unicode code points, and as such the ordering may not be strictly in Unicode-scalar order. @dabrahams, any further comments?

dabrahams commented 7 years ago

Yeah:

  1. You can force them to be UnicodeScalar's by casting:

    print("\u{DF}" as UnicodeScalar \... "\u{E0}")
  2. The ordering of Strings is probably not going to be stable from release to release, especially between Swift 3 and Swift 4. I think we can easily promise that the ordering of UnicodeScalar's will remain stable.

swift-ci commented 7 years ago

Comment by Georg Bauhaus (JIRA)

The program source file attached has

        switch Character(u)  {
        case "\u{DF}" ... "\u{E0}":
            break

The REPL says

 23> "\u{DF}" is Character
$R7: Bool = true
 24> "\u{DF}" is UnicodeScalar
$R8: Bool = true
 25> "\u{DF}" is String
$R9: Bool = true

Same for E0. This agrees with the Swift book, the literals are taken to be of these types.

I was suspecting that in the ClosedRange, upper bound Character "à" was decomposed under the hood after being interpreted to be a Grapheme Cluster. Is there a way to see that easily?

This one doesn't crash:

"\u{DF}" as UnicodeScalar ... ("\u{E0}" as UnicodeScalar) 
$R15: ClosedRange<UnicodeScalar> = {
  lowerBound = U'ß'
  upperBound = U'à'
}
swift-ci commented 7 years ago

Comment by Georg Bauhaus (JIRA)

Question in the margin: How does the compiler detect lack of exhaustion in a switch here when all is supposedly happening in the standard library?

belkadan commented 7 years ago

The is tests aren't meaningful because they're affecting the type of the literal the way as would, which is kind of a bug. But again, Characters aren't necessarily meaningfully ordered across separate Unicode blocks, and they're certainly not ordered in Unicode-scalar order.

To your other question: the compiler assumes all switches need a default unless it can prove otherwise. Sometimes there's a false positive with that, to which the recommended pattern is a call to fatalError.

swift-ci commented 7 years ago

Comment by Georg Bauhaus (JIRA)

OK, happy with it. In view of:

  1> 1 ... 0
fatal error: Can't form Range with upperBound < lowerBound
Current stack trace:
...

which has compile time literals 1 and 0 and the expression perhaps following some case keyword. if a type is linearly ordered like integers, couldn't the compiler evaluate the literals around ... and balk? But maybe this is a side issue meandering into a duplicate (which I think I have seen elsewhere but couldn't find just now, sorry).

If a type isn't linearly ordered, like Character then seems to be, and there are compile time values in a ... b, could the compiler refuse to accept altogether in this static situation? The operands a and b lack the order required by ..., conceptually, so …?

belkadan commented 7 years ago

1 and 0 are known at compile-time, but the semantics of ... are not. It would certainly be a nice feature to add, though, if we determined that we were using the normal ... that makes a Range.

We do want to allow cases like "a"..."z", though, so we'd have to make sure the compiler did the comparison the same way that it would be performed at run-time. So I don't think we'd warn in the Character case no matter what.