swiftlang / swift

The Swift Programming Language
https://swift.org
Apache License 2.0
67.69k stars 10.39k forks source link

[SR-11759] Redundant copies of array buffers when using ArraySlice #54166

Open dmcyk opened 5 years ago

dmcyk commented 5 years ago
Previous ID SR-11759
Radar None
Original Reporter @dmcyk
Type Bug

Attachment: Download

Environment Swift 5.1.2 in Xcode 11.2.1 (11B53)
Additional Detail from JIRA | | | |------------------|-----------------| |Votes | 0 | |Component/s | Standard Library | |Labels | Bug, ARC, Performance | |Assignee | None | |Priority | Medium | md5: 833b0855fed40f2adef4f523ab320bfd

Issue Description:

When using `ArraySlice`, underlying buffer gets copied when elements of an array (not the array itself) are modified.

Documentation for `ArraySlice` says:

Instead of copying over the elements of a slice to new storage, an ArraySlice instance presents a view onto the storage of a larger array

Based on that, I assumed there should be no copies of the array buffer, is the assumption correct or am I misunderstanding purpose of the type?

I've attached benchmark sample code and results from `Instruments`.

But the code boils down to two cases:

@inline(never)
func doSlices(of arr: inout [Large]) {
  ... 
  // create array slice
  let slice = arr[0 ..< x]

  ...
  // modify array
  arr[i].a += 1 
}

@inline(never)
func doSlicesScope(of arr: inout [Large]) {
  // wrap slice usage in explicit scope
  let ... = {
    let slice = arr[0 ..< x]
    return slice.count
  }()

  ...
  // modify array
  arr[i].a += 1
}

On my machine in the attached benchmark sample first case takes `1.8sec` and second takes `32ms`, the difference is caused by array buffer copies. In the second case the slice is used in its own scope so there will be no need for copies.

The sample is only for the need of benchmark, I wouldn't have though there will be buffer copies happening in the first case.

beccadax commented 5 years ago

This is legal, but not ideal:

@gottesmm

dmcyk commented 5 years ago

Thanks for the insights.
So I understand now why it happens, Array and ArraySlice both reference same underlying buffer and even when array element is changed CoW triggers.

Regardless of possible optimisations in this specific case, is that desired behaviour of the type?
I'd assume Slice being a view into array storage wouldn't cause CoW to trigger. I understand this can't be represented in the language yet, but considering the ownership manifesto, do you think semantically it'd make sense for Slice to have sort of 'shared' view of the buffer so that its reference to the buffer doesn't count when using `isKnownUniquelyReferenced`, or in future some other CoW API?

177d8476-2756-4152-91d7-984f74d3896c commented 5 years ago

Slices are considered their own values (they have value semantics), so they would not observe changes to the original collection and would participate in COW. Subscripts have accessors, which allow for results to be assigned back to the collection. If `slice` is used after the mutation, or escapes the function, then it must require a COW copy to happen. Otherwise, the optimizer (in theory at least) should be permitted to skip the copy.

The code example you posted does not need the copy to maintain correctness. I would consider this an optimizer bug, unless I'm missing something here. (CC @lorentey just in case I'm missing something). My measurements have `doSlicesScope` as being around 70x faster than `doSlices` with `-O`. They are equivalent without `-O`.

dmcyk commented 5 years ago

Thanks for the explanation! I've also looked up documentation on `Slice` and the part about them being considered their own value is clearly documented. Earlier I was looking at `ArraySlice` specifically where those semantics are not that explicitly mentioned and the part of documentation saying 'presenting a view onto the storage...' made me link it with similar view types in other languages, my bad.

Not sure if relevant, but I'm seeing some difference without `-O`, `doSlices` seems to be 8-10% slower due to additional time spent in `Array._makeMutableAndUnique`, but I guess that's just the cost of CoW here. Not much compared to 70x difference with `-O`, though that is also so significant because buffer/stored structs are relatively big.