yancyribbens / arbitrary-bitcoin

0 stars 0 forks source link

devirtualized ASM #1

Open yancyribbens opened 1 month ago

yancyribbens commented 1 month ago

You'd have to also call the method with concrete values to get the proper assembly (with inlining etc).

The generated assembly which calls Amount::arbitrary(); as generated by cargo asm arbitary_bitcoin::main

    Finished `release` profile [optimized] target(s) in 0.04s

.section .text.arbitary_bitcoin::main,"ax",@progbits
        .p2align        4, 0x90
        .type   arbitary_bitcoin::main,@function
arbitary_bitcoin::main:
        .cfi_startproc
        push rax
        .cfi_def_cfa_offset 16
        call qword ptr [rip + <bitcoin_units::amount::Amount as proptest::arbitrary::traits::Arbitrary>::arbitrary_with@GOTPCREL]
        mov qword ptr [rsp], rax
        lock dec        qword ptr [rax]
        jne .LBB6_2
        #MEMBARRIER
        mov rdi, rsp
        call alloc::sync::Arc<T,A>::drop_slow
.LBB6_2:
        pop rax
        .cfi_def_cfa_offset 8
        ret

Call happens here:

call qword ptr [rip + <bitcoin_units::amount::Amount as proptest::arbitrary::traits::Arbitrary>::arbitrary_with@GOTPCREL]

What does it mean to call with concrete values?

Kixunil commented 1 month ago

Are you sure it's a fully optimized build? The method wasn't inlined. Try turning on LTO or add #[inline] attribute.

But I think it'd be best to write a dumb proptest (e.g checking amount == amount - reflexivity) and check its assembly.

yancyribbens commented 1 month ago

Are you sure it's a fully optimized build? The method wasn't inlined. Try turning on LTO or add #[inline] attribute.

@Kixunil The same code as above with inline added to arbitrary_with. Note that the call call qword ptr [rip + <bitcoin_units::amount::Amount as proptest::arbitrary::traits::Arbitrary>::arbitrary_with@GOTPCREL] is no longer there as it's now inlined.

.section .text.arbitary_bitcoin::main,"ax",@progbits                                                                                         [0/154]
        .p2align        4, 0x90
        .type   arbitary_bitcoin::main,@function
arbitary_bitcoin::main:
        .cfi_startproc
        push rax
        .cfi_def_cfa_offset 16
        mov rax, qword ptr [rip + __rust_no_alloc_shim_is_unstable@GOTPCREL]
        movzx eax, byte ptr [rax]
        mov edi, 24
        mov esi, 8
        call qword ptr [rip + __rust_alloc@GOTPCREL]
        test rax, rax
        je .LBB7_4
        mov qword ptr [rax], 1
        mov qword ptr [rax + 8], 1
        lea rcx, [rip + core::ops::function::FnOnce::call_once]
        mov qword ptr [rax + 16], rcx
        mov qword ptr [rsp], rax
        lock dec        qword ptr [rax]
        jne .LBB7_3
        #MEMBARRIER
        mov rdi, rsp
        call alloc::sync::Arc<T,A>::drop_slow
.LBB7_3:
        pop rax
        .cfi_def_cfa_offset 8
        ret
.LBB7_4:
        .cfi_def_cfa_offset 16
        mov edi, 8
        mov esi, 24
        call qword ptr [rip + alloc::alloc::handle_alloc_error@GOTPCREL]
yancyribbens commented 1 month ago

But I think it'd be best to write a dumb proptest (e.g checking amount == amount - reflexivity) and check its assembly.

I added a dumb prop test but for brevity I'm not going to post all of the ASM code here because it's a lot since it also includes all the test runner stuff in the asm. To generate the asm cargo asm --profile test prop_test 0

Kixunil commented 1 month ago

Sadly, it looks like it's still unoptimized. Though maybe the code calling it is. If you can post like ten lines before/after the call from that dumb test we could figure it.

yancyribbens commented 1 month ago

Sadly, it looks like it's still unoptimized. Though maybe the code calling it is. If you can post like ten lines before/after the call from that dumb test we could figure it.

I commited the full assembly here: https://github.com/yancyribbens/arbitrary-bitcoin/blob/main/asm/dumb_test

yancyribbens commented 1 month ago

s/commited/committed

Kixunil commented 1 month ago

So the call is devirtualized but that Map has Arc just so it can impl Clone :man_facepalming: :man_facepalming: :man_facepalming:

Thankfully if I understand the code correctly it's not called in a loop, so we could use it from performance perspective. But honestly, I'm not convinced we shouldn't newtype it.

yancyribbens commented 1 month ago

So the call is devirtualized but that Map has Arc just so it can impl Clone

Are you looking at line 164?

call core::ptr::drop_in_place<proptest::strategy::map::Map<proptest::strategy::map::Map<proptest::num::u64::Any,fn(u64) .> bitcoin_units::amount::Amount>,arbitary_bitcoin::prop_test::{{closure}}>

I'm not sure how you can tell this is devirtualized. My understanding is the compiler can't devirtualize only in cases where the types such as a boxed type can't be known at compile time which is posssible to know just by looking at the code without needing to see the assembly.

Thankfully if I understand the code correctly it's not called in a loop, so we could use it from performance perspective.

Well L:170 has a call to jump back to L:162 which looks like a loop to me.. Although I don't know much about what you're looking for exactly when you say from a perf perspective.

Kixunil commented 1 month ago

I can see that there's no messing around with function pointers in the code generating the value but destructors still need to be called.

Well L:170 has a call to jump back to L:162 which looks like a loop to me..

L170 should be unreachable because of jump at L165 but that is weird so I assume I don't understand the assembly well enough and you might be right.

yancyribbens commented 1 month ago

Ok cool, since there is no function pointers in the generated code, I'll create a PR to add these to rust-bitcoin.