swiftlang / swift-corelibs-foundation

The Foundation Project, providing core utilities, internationalization, and OS independence
swift.org
Apache License 2.0
5.29k stars 1.13k forks source link

[SR-10080] Unexpected behaviour with String/Range bridged from NSString/NSRange #3517

Open bobergj opened 5 years ago

bobergj commented 5 years ago
Previous ID SR-10080
Radar None
Original Reporter @bobergj
Type Bug

Attachment: Download

Environment Swift 5, Swift 5.1 Xcode Version 10.3 (10G8), Xcode Version 11.0 beta 4 (11M374r) macOS 10.14.5 (18F132)
Additional Detail from JIRA | | | |------------------|-----------------| |Votes | 1 | |Component/s | Foundation | |Labels | Bug | |Assignee | None | |Priority | Medium | md5: 059d4512ad3a2419b0bb7d656dc0731a

relates to:

Issue Description:

let str = "同意"
let nsStr = str as NSString

let nsRange = NSRange(location: 1, length: 1)
let range = Range(nsRange, in: str)!
print(str[range])              // Swift 5.0: "意", Swift 4.2: "意"

let strBridged = nsStr as String
let strBridgedRange = Range(nsRange, in: strBridged)!
print(strBridged[strBridgedRange])
print(str[strBridgedRange])    // Swift 5.0: "\220", Swift 5.0: "意"

How this can bite you in practice:

import NaturalLanguage

let str = "同意"
let tagger = NLTagger(tagSchemes: [.tokenType])
tagger.string = str
tagger.enumerateTags(in: str.startIndex..<str.endIndex, unit: .word, scheme: .tokenType) { (tag, range) -> Bool in
    print(str[range])
    return true
}

The aboves does nothing in Swift 5.0, prints "同意" in Swift 4.2.

I understand that the above code may never have been valid in Swift, but this bridging behaviour makes it slightly unergonomic to work with several Apple frameworks APIs.

Alternatively, perhaps a function similar to "equivalentRange" below should be included in Swift Foundation?

Workaround:

extension String {
    func equivalentRange(rangeInOtherString range: Range<String.Index>, otherString: String) -> Range<String.Index>? {
        let lowerBoundOffset = otherString.distance(from: otherString.startIndex, to: range.lowerBound)
        let upperBoundOffset = otherString.distance(from: otherString.startIndex, to: range.upperBound)
        if let lowerBound = index(startIndex, offsetBy: lowerBoundOffset, limitedBy: endIndex),
            let upperBound = index(startIndex, offsetBy: upperBoundOffset, limitedBy: endIndex) {
            return lowerBound..<upperBound
        } else {
            return nil
        }
    }
}
let str = "同意"
let tagger = NLTagger(tagSchemes: [.tokenType])
tagger.string = str
tagger.enumerateTags(in: tagger.string!.startIndex..<tagger.string!.endIndex, unit: .word, scheme: .tokenType) { (tag, range) -> Bool in
    print(str[str.equivalentRange(rangeInOtherString: range, otherString: tagger.string!)!])
    return true
}
bobergj commented 5 years ago

@milseman sorry for using bandwidth but what do you think?

177d8476-2756-4152-91d7-984f74d3896c commented 5 years ago

It looks like you're using indices derived from one string in another, which is not safe (and we probably want to find a way to trap on it in the future). Is there a way we could make doing the right thing more ergonomic here?

bobergj commented 5 years ago

It looks like you're using indices derived from one string in another

Technically that is true, because we have another string instance (with a different storage) after bridging back and forth. But the fact that NLTagger is implemented in Objective-C could be considered an implementation detail.

Another example which is perhaps more to the point:

        let str = "同意"
        let range = str.startIndex..<str.index(after: str.startIndex)
        print(str[range])    //  "同" OK

        var str1 = str
        print(str1[range])   //  "同" OK and arguably safe, we have not mutated the string

        str1.append("foo")
        print(str1[range])   //  "同" OK and arguably safe. Collection append (in contrast to other methods) is not documented to invalidate indices 

        class SomeSwiftClass {
            var str: String = ""
        }

        let someSwiftClass = SomeSwiftClass()
        someSwiftClass.str = str
        let str2 = someSwiftClass.str
        print(str2[range])          //  "同" OK and arguably safe, we have not mutated the string

        let someObjcClass = SomeObjcClass()  // Objc class with @property (nonatomic, strong) NSString *string;
        someObjcClass.str = str
        let str3 = someObjcClass.str
        print(str3[range])          //  BOOM  Fatal error: String index range is out of bounds