swiftlang / swift

The Swift Programming Language
https://swift.org
Apache License 2.0
67.59k stars 10.37k forks source link

[SR-7511] Relatively simple loop shows unexpected O(n^2) behavior #50053

Open swift-ci opened 6 years ago

swift-ci commented 6 years ago
Previous ID SR-7511
Radar None
Original Reporter iosdevzone (JIRA User)
Type Bug

Attachment: Download

Environment macOS 10.13.4 Beta (17E170c) Apple Swift version 4.1 (swiftlang-902.0.48 clang-902.0.37.1) Target: x86_64-apple-darwin17.5.0
Additional Detail from JIRA | | | |------------------|-----------------| |Votes | 0 | |Component/s | Standard Library | |Labels | Bug | |Assignee | None | |Priority | Medium | md5: 7c118b707446ed86d1f5837100aaecb1

Issue Description:

On macOS (I don't have a linux box handy right now) this rather simple looking loop demonstrates the problem:

public func countNewlines3(string s: String) -> Int {
{{ let newline = Character("\n")}}
{{ var n = 0}}
{{ for i in 0..\<s.count where s[s.index(s.startIndex, offsetBy: i)] == newline {}}
{{ n = n + 1}}
{{ }}}
{{ return n}}
{{}}}

For any large string this soon starts to take seconds to run. The attached file contains a runnable demonstration and also a alternate implementation the demonstrates the expected O( n ) performance.

swift-ci commented 6 years ago

Comment by iOSDevZone (JIRA)

OK, I think I've figured out what is going on, so maybe this is intended behavior. It is because

s.index(s.startIndex, offsetBy: i)

is O( i ). This does make random access into large strings quite slow without taking mitigating action.

Perhaps the standard library should cache some indices?

belkadan commented 6 years ago

This is why you should loop over s.indices instead of starting at the beginning each time. Or just loop over the Characters in the String instead.

Anything to add, @milseman?

177d8476-2756-4152-91d7-984f74d3896c commented 6 years ago

The code is O(n^2) as written. This is why we don't provide an Int-based subscript directly, though we could provide clearer reasoning behind why something like index(_:offsetBy:)
takes O(n) time.

Jordan's suggestion:

for idx in s.indices {
  ... s[idx] ...
}

is the better pattern and is much more ergonomic anyways.

swift-ci commented 6 years ago

Comment by iOSDevZone (JIRA)

As I was writing my first comment on this, it did occur to me that that is probably why direct integer indexing is not provided! Somehow, I also managed to overlook `s.indices`; that is a much neater solution and is what I trying to get at in my comment about caching indices.

So it seems there are adequate, existing ways to avoid the O(n^2) behavior, and thank you both for taking time to respond educate me on better approaches.

As far as I am concerned, this closes this issue for me. (Problem was located between chair and keyboard!)

Perhaps worth mentioning, the performance, even using `s.indices`, compared to (1) NSString in Swift (2) unsafe pointers in Swift (3) NSString in Objective-C (4) std::string in C++, is got great, but that's a subject for another report.

177d8476-2756-4152-91d7-984f74d3896c commented 6 years ago

Are you comparing grapheme iterators to code units iterators? If so, they're algorithmically different entities.

(FWIW, String's UTF16View also has about 25-50% overhead due to un-paired surrogate detection, so that's not great either)

swift-ci commented 6 years ago

Comment by iOSDevZone (JIRA)

@milseman no, I am not comparing dissimilar iterators. The performance issues I mentioned occur when working in the UTF-16 codepoint world (apologies, I could have been clearer). At some levels I can ignore un-paired surrogate pairs (e.g. sentence or line segmentation) and leave higher level processing to detect them, in others I am detected them, so it may not be an entirely level playing field, but the differences are not adequately explained by this. Happy to provide more information and code examples, but did not want to muddy the comments section with unrelated issues.

177d8476-2756-4152-91d7-984f74d3896c commented 6 years ago

Please do, we're also always on the lookout for more benchmarks.

The slowdown I was alluding to is UTF16View.subscript, which checks the neighbors for an un-paired surrogate so it can vend 0xFFFD. Personally, I think this is the wrong model and we need to move to explicit control upon initialization and emergent behavior upon violation (I believe Rust eventually moved to this model too).

swift-ci commented 6 years ago

Comment by iOSDevZone (JIRA)

What's the appropriate way to submit such benchmarks/code? Just open an issue here?

177d8476-2756-4152-91d7-984f74d3896c commented 6 years ago

The best option is to open a PR against the in-tree benchmark suite: https://github.com/apple/swift/tree/master/benchmark. Ideally it would build and be properly integrated. But, if you need help with that you can just open a PR with the benchmark code in it (even if it doesn't build) and I can adapt it for the suite.

swift-ci commented 6 years ago

Comment by iOSDevZone (JIRA)

OK, I'll look into it. I may run into time constraint issues, in which case I may just hand off an Xcode test suite but I'll try to give you something that works "out of the box". Thanks for your assistance and advice.