swiftlang / swift

The Swift Programming Language
https://swift.org
Apache License 2.0
67.66k stars 10.38k forks source link

[SR-10604] dictionary modify accessor allocates every single time #53004

Open weissi opened 5 years ago

weissi commented 5 years ago
Previous ID SR-10604
Radar rdar://problem/52529030
Original Reporter @weissi
Type Bug
Additional Detail from JIRA | | | |------------------|-----------------| |Votes | 1 | |Component/s | Compiler, Standard Library | |Labels | Bug, Performance | |Assignee | None | |Priority | Medium | md5: 4ba5c6b107c61899a4524c1db392517b

relates to:

Issue Description:

@Lukasa just showed me some of his HTTP/2 code that allocates a lot. He saw it allocating in presumably here: https://github.com/apple/swift/blob/577e04d35be3dba4954bd99eac5ef2af264c60b3/stdlib/public/core/DictionaryVariant.swift#L97

But the whole complexity of the http/2 code isn't actually necessary, even the most simple use of the modify accessor triggers the allocations:

extension Optional {
    mutating func apply(_ body: (Wrapped) -> Wrapped?) {
        guard let real = self else {
            return
        }

        self = nil
        self = body(real)
    }
}

@inline(never)
func repro() {
    var dict: [Int: Int] = [:]
    dict[1] = 1
    for _ in 0..<10_000 {
        dict[1].apply { _ in 1 }
    }
    print(dict)
}

repro()

ran, produces under dtrace that traces every malloc:

              libsystem_malloc.dylib`malloc
              TestApp`specialized _NativeDictionary.subscript.modify+0x24
              TestApp`repro()+0x8c
              TestApp`main+0x9
              libdyld.dylib`start+0x1
              TestApp`0x1
            10000

That means for 10,000 calls to the modify accessor, we allocated 10,000 times :'(.

We can reduce the code further to

extension Optional where Wrapped == Int {
    mutating func apply() {
        self = 1
    }
}

@inline(never)
func repro() {
    var dict: [Int: Int] = [:]
    dict[1] = 1
    for _ in 0..<10_000 {
        dict[1].apply()
    }
    print(dict)
}

repro()
belkadan commented 5 years ago

cc @rjmccall, @eeckstein

weissi commented 5 years ago
   Apple Swift version 5.0-dev (LLVM 082dec2e22, Swift 4e86c8ff52)
Target: x86_64-apple-darwin18.5.0

(same for Swift 5.0.1) ran

jw-swift-latest swift -version && jw-swift-latest swiftc -O test.swift && sudo ~/devel/swift-nio/dev/malloc-aggregation.d -c ./test
weissi commented 5 years ago

better stack for the allocation:

(lldb) bt
* thread #&#8203;1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #&#8203;0: 0x00007fff6532ab69 libsystem_malloc.dylib`malloc
    frame #&#8203;1: 0x0000000100000e14 test`specialized _NativeDictionary.subscript.modify at <compiler-generated>:0 [opt]
    frame #&#8203;2: 0x0000000100000cf9 test`repro() [inlined] generic specialization <Swift.Int, Swift.Int> of Swift.Dictionary._Variant.subscript.modify : (A) -> Swift.Optional<B> at <compiler-generated>:0 [opt]
    frame #&#8203;3: 0x0000000100000ce0 test`repro() [inlined] generic specialization <Swift.Int, Swift.Int> of Swift.Dictionary.subscript.modify : (A) -> Swift.Optional<B> at <compiler-generated>:0 [opt]
    frame #&#8203;4: 0x0000000100000ce0 test`repro() at test.swift:12 [opt]
    frame #&#8203;5: 0x0000000100000c49 test`main at test.swift:17:1 [opt]
    frame #&#8203;6: 0x00007fff651803d5 libdyld.dylib`start + 1

the assembly confirms that it's allocating unconditionally pretty much before doing anything else

(lldb) frame select 1
test was compiled with optimization - stepping may behave oddly; variables may not be available.
frame #&#8203;1: 0x0000000100000e14 test`specialized _NativeDictionary.subscript.modify at <compiler-generated>:0 [opt]
(lldb) dis
test`specialized _NativeDictionary.subscript.modify:
    0x100000df0 <+0>:   pushq  %rbp
    0x100000df1 <+1>:   movq   %rsp, %rbp
    0x100000df4 <+4>:   pushq  %r15
    0x100000df6 <+6>:   pushq  %r14
    0x100000df8 <+8>:   pushq  %r13
    0x100000dfa <+10>:  pushq  %r12
    0x100000dfc <+12>:  pushq  %rbx
    0x100000dfd <+13>:  subq   $0x18, %rsp
    0x100000e01 <+17>:  movl   %edx, -0x3c(%rbp)
    0x100000e04 <+20>:  movq   %rsi, %r15
    0x100000e07 <+23>:  movq   %rdi, %rbx
    0x100000e0a <+26>:  movl   $0x40, %edi
    0x100000e0f <+31>:  callq  0x100001e82               ; symbol stub for: malloc
->  0x100000e14 <+36>:  movq   %rax, %r14
    0x100000e17 <+39>:  movq   %rax, (%rbx)
    0x100000e1a <+42>:  movq   %r13, 0x8(%rax)
    0x100000e1e <+46>:  movq   %r15, (%rax)
    0x100000e21 <+49>:  movq   %r13, -0x38(%rbp)
    0x100000e25 <+53>:  movq   (%r13), %r12
    0x100000e29 <+57>:  movq   0x28(%r12), %rdi
    0x100000e2e <+62>:  movq   %r15, %rsi
    0x100000e31 <+65>:  callq  0x100001e70               ; symbol stub for: static Swift.Hasher._hash(seed: Swift.Int, _: Swift.UInt64) -> Swift.Int
[...]

The first BB in the SIL looks like this

// specialized _NativeDictionary.subscript.modify
sil shared [always_inline] @$ss17_NativeDictionaryV_8isUniqueq_Sgx_SbtciMSi_SiTg5 : $@yield_once @convention(method) (Int, Bool, @inout _NativeDictionary<Int, Int>) -> @yields @inout Optional<Int> {
// %0                                             // users: %173, %183, %3
// %1                                             // users: %97, %105
// %2                                             // users: %181, %171, %154, %137, %301, %298, %189, %192, %110, %4
bb0(%0 : $Int, %1 : $Bool, %2 : $*_NativeDictionary<Int, Int>):
  %3 = struct_extract %0 : $Int, #Int._value      // users: %52, %232, %273, %315, %9
  %4 = load %2 : $*_NativeDictionary<Int, Int>    // users: %337, %317, %51, %5
  %5 = struct_extract %4 : $_NativeDictionary<Int, Int>, #_NativeDictionary._storage // users: %305, %89, %75, %43, %14, %12, %6
  %6 = ref_element_addr %5 : $__RawDictionaryStorage, #__RawDictionaryStorage._seed // user: %7
  %7 = load %6 : $*Int                            // user: %11
  %8 = metatype $@thin Hasher.Type                // users: %196, %11
  %9 = struct $UInt64 (%3 : $Builtin.Int64)       // users: %196, %11
  // function_ref static Hasher._hash(seed:_:)
  %10 = function_ref @$ss6HasherV5_hash4seed_S2i_s6UInt64VtFZ : $@convention(method) (Int, UInt64, @thin Hasher.Type) -> Int // users: %196, %11
  %11 = apply %10(%7, %9, %8) : $@convention(method) (Int, UInt64, @thin Hasher.Type) -> Int // user: %24
  %12 = ref_tail_addr %5 : $__RawDictionaryStorage, $_UnsafeBitset.Word // users: %62, %326, %32
  %13 = integer_literal $Builtin.Int64, 1         // users: %78, %68, %55, %248, %235, %332, %319, %290, %277, %218, %204, %203, %38, %22, %20
  %14 = ref_element_addr %5 : $__RawDictionaryStorage, #__RawDictionaryStorage._scale // user: %15
  %15 = struct_element_addr %14 : $*Int8, #Int8._value // user: %16
  %16 = load %15 : $*Builtin.Int8                 // user: %17
  %17 = builtin "sextOrBitCast_Int8_Int64"(%16 : $Builtin.Int8) : $Builtin.Int64 // user: %19
  %18 = integer_literal $Builtin.Int64, 63        // users: %67, %247, %331, %289, %217, %202, %37, %19
  %19 = builtin "and_Int64"(%17 : $Builtin.Int64, %18 : $Builtin.Int64) : $Builtin.Int64 // user: %20
  %20 = builtin "shl_Int64"(%13 : $Builtin.Int64, %19 : $Builtin.Int64) : $Builtin.Int64 // user: %22
  %21 = integer_literal $Builtin.Int1, 0          // users: %78, %55, %235, %319, %277, %204, %95, %22
  %22 = builtin "ssub_with_overflow_Int64"(%20 : $Builtin.Int64, %13 : $Builtin.Int64, %21 : $Builtin.Int1) : $(Builtin.Int64, Builtin.Int1) // user: %23
  %23 = tuple_extract %22 : $(Builtin.Int64, Builtin.Int1), 0 // users: %57, %321, %25
  %24 = struct_extract %11 : $Int, #Int._value    // user: %25
  %25 = builtin "and_Int64"(%24 : $Builtin.Int64, %23 : $Builtin.Int64) : $Builtin.Int64 // users: %55, %46, %36, %30, %26
  %26 = struct $Int (%25 : $Builtin.Int64)        // user: %27
  %27 = struct $_HashTable.Bucket (%26 : $Int)    // users: %54, %42
  %28 = integer_literal $Builtin.Int64, 64        // users: %66, %60, %246, %240, %330, %324, %288, %282, %216, %210, %36, %30
  %29 = integer_literal $Builtin.Int64, 0         // users: %308, %70, %250, %334, %292, %220, %40
  %30 = builtin "udiv_Int64"(%25 : $Builtin.Int64, %28 : $Builtin.Int64) : $Builtin.Int64 // user: %31
  %31 = builtin "truncOrBitCast_Int64_Word"(%30 : $Builtin.Int64) : $Builtin.Word // user: %32
  %32 = index_addr %12 : $*_UnsafeBitset.Word, %31 : $Builtin.Word // user: %33
  %33 = struct_element_addr %32 : $*_UnsafeBitset.Word, #_UnsafeBitset.Word.value // user: %34
  %34 = struct_element_addr %33 : $*UInt, #UInt._value // user: %35
  %35 = load %34 : $*Builtin.Int64                // user: %39
  %36 = builtin "urem_Int64"(%25 : $Builtin.Int64, %28 : $Builtin.Int64) : $Builtin.Int64 // user: %37
  %37 = builtin "and_Int64"(%36 : $Builtin.Int64, %18 : $Builtin.Int64) : $Builtin.Int64 // user: %38
  %38 = builtin "shl_Int64"(%13 : $Builtin.Int64, %37 : $Builtin.Int64) : $Builtin.Int64 // user: %39
  %39 = builtin "and_Int64"(%35 : $Builtin.Int64, %38 : $Builtin.Int64) : $Builtin.Int64 // user: %40
  %40 = builtin "cmp_eq_Int64"(%39 : $Builtin.Int64, %29 : $Builtin.Int64) : $Builtin.Int1 // user: %41
  cond_br %40, bb1, bb2                           // id: %41
rjmccall commented 5 years ago

We know there's work to do to make the coroutine lowering passes more aggressive about not heap-allocating the stack frame if e.g. there are multiple resume points. This SR will serve as a fine anchor for tracking that work.

weissi commented 5 years ago

@swift-ci create

lorentey commented 5 years ago

I submitted a stdlib-side workaround:

https://github.com/apple/swift/pull/26549

This can be reverted when the underlying optimization issue is resolved.

Lukasa commented 4 years ago

Recent testing with Swift 5.3 seems to show allocations back in the dictionary modify accessor, in the exact same place. @lorentey did we regress this in any way?