swiftlang / swift

The Swift Programming Language
https://swift.org
Apache License 2.0
67.49k stars 10.35k forks source link

[SR-12221] No safe way to efficiently implement a move operation on array and similar collections #54647

Open Lukasa opened 4 years ago

Lukasa commented 4 years ago
Previous ID SR-12221
Radar rdar://problem/59516882
Original Reporter @Lukasa
Type Bug
Environment Apple Swift version 5.2 (swiftlang-1103.0.24.3 clang-1103.0.30.1) Target: x86_64-apple-darwin19.4.0
Additional Detail from JIRA | | | |------------------|-----------------| |Votes | 1 | |Component/s | Standard Library | |Labels | Bug | |Assignee | None | |Priority | Medium | md5: cadb9c21ac9dd647b5b5f2de3155668c

Issue Description:

For collections that are range replaceable and random access, in principle it ought to be possible to implement relatively fast move operations. In some cases, such as arrays over primitive types, this operation can literally delegate to memmove.

Unfortunately, there is no way in safe Swift to express this requirement. Here is a Compiler Explorer page that shows a reasonable implementation of memmove on [UInt8]. I've reproduced it below:

extension Array where Element == UInt8 {
    mutating func moveRange(offset: Int, range: Range<Index>) {        
        // A fully general implementation will have a way to zero-extend
        // the array, but no performant solution exists for that either.
        // Therefore we precondition that the move is entirely in-bounds.
        precondition(range.lowerBound >= self.startIndex)
        precondition(range.upperBound <= self.endIndex)
        precondition(self.index(range.upperBound, offsetBy: offset, limitedBy: self.endIndex) != nil)
        precondition(self.index(range.lowerBound, offsetBy: offset, limitedBy: self.startIndex) != nil)

        if offset > 0 {
            for index in range.reversed() {
                self[self.index(index, offsetBy: offset)] = self[index]
            }
        } else {
            for index in range {
                self[self.index(index, offsetBy: offset)] = self[index]
            }
        }
    }
}

Currently, the compiler cannot observe that the preconditions written in this function are sufficient to cover all possible subscripting errors, and nor can it hoist the bounds checks. The end result of this is that the memmove is done byte by byte, checking that every index is in bounds (as we know it will be).

Note that replaceSubrange cannot be useful for this. Because of replaceSubrange's generality, passing a slice of self into that call will cause a CoW operation. Again, even if the optimiser can solve this for inlinable code, it cannot solve this across resilient module boundaries, where the compiler must be pessimistic.

I can see only two routes out of this. The first is to make the optimiser smarter. This seems extremely hard, particularly for types that have complex index objects, such as NIO's CircularBuffer, and it's outright impossible for resilient types.

An alternative option is to have a protocol that can express that a type is capable of a performant object move. This is like RangeReplaceableCollection: indeed, it'd be something line RangeMoveableCollection. This could be implemented as a move(range: Range<Index>, to: Range<Index>), could perform only the 4 bounds checks required, and then perform a direct copy without needing further bounds checks. These copies can vectorise on trivial types.

weissi commented 4 years ago

@swift-ci create

natecook1000 commented 4 years ago

Our collections can't technically implement "move" operations, since that would leave a hole of uninitialized memory in the array. For copying from one part of a collection to another, we do have withUnsafeMutableBufferPointer (for arrays) and withContiguousMutableStorageIfAvailable (for generic mutable collections) that will let you skip all the bounds checks. The bigger problem I see is that we don't have a codified way of specifying that a collection has a trivially copyable element type, so there's no real way to know if it's safe to drop down to that level.

Here's a quick draft of using withContiguousMutableStorageIfAvailable to avoid the bounds checks you described:

extension MutableCollection where Element: FixedWidthInteger {
    mutating func copy(subrange: Range<Index>, to dest: Index) {
        // Bounds checks, etc.
        precondition(subrange.lowerBound >= startIndex)
        precondition(subrange.upperBound <= endIndex)
        let subrangeCount = distance(from: subrange.lowerBound, to: subrange.upperBound)
        guard let destUpperBound = index(dest, offsetBy: subrangeCount, limitedBy: endIndex) else {
            preconditionFailure()
        }
        let subrangeOffset = distance(from: startIndex, to: subrange.lowerBound)
        let destOffset = distance(from: startIndex, to: dest)

        // Contiguous case
        let r: Void? = withContiguousMutableStorageIfAvailable { buffer in
            (buffer.baseAddress! + destOffset).assign(
                from: (buffer.baseAddress! + subrangeOffset),
                count: subrangeCount)
        }

        // Non-contiguous case
        if r != nil { return }
        self[dest..<destUpperBound] = self[subrange]
    }
}
natecook1000 commented 4 years ago

But yes, this doesn't solve the issue for noncontiguous types collection types that could otherwise efficiently implement operations like this.

Lukasa commented 4 years ago

Sorry about the terminology, I probably shouldn't have said "move". Let's say "copy". I want something semantically equivalent to memmove, and memmove does not concern itself with destructing the objects left behind. Neither do I want them to. The normal use-case for this would be "I am about to insert some new elements in the middle, please shift these things out of the way".

I am also aware of the unsafe methods, but they don't meet my requirements. I don't think we should need to write unsafe code for this.

Lukasa commented 4 years ago

Incidentally if we didn't like the "just leave the old values behind" model, we can always do move(from:to:replacingElementsWith: ) to allow users to specify what the moved values should be replaced with.