Optimize binary concatenation in gun:await_body

zmstone commented 2 years ago

Disclaimer: I do not have hard prof on how much this will gain. but my experience is: when large data comes in small pieces, the concatenation can be very much expensive.

[update] with a simple test:

2> bin_concat:compare(1000).
append_to_end   : 2497
iolist_to_binary: 257
ok
3> bin_concat:compare(10000).
append_to_end   : 26483
iolist_to_binary: 4535
ok
4> bin_concat:compare(100000).
append_to_end   : 220178
iolist_to_binary: 45884

code:

-module(bin_concat).

-export([compare/1]).

compare(N) ->
    AppendT = append(N),
    IolistT = iolist(N),
    io:format("append_to_end   : ~p~n", [AppendT]),
    io:format("iolist_to_binary: ~p~n", [IolistT]).

append(N) ->
    do(N, fun do_append/1).

iolist(N) ->
    do(N, fun do_iolist/1).

do(N, F) ->
    BinList = lists:duplicate(N, crypto:strong_rand_bytes(4096)),
    {T, _} = timer:tc(fun() -> F(BinList) end),
    T.

do_append(BinList) ->
    do_append(BinList, <<>>).

do_append([], Acc) -> Acc;
do_append([Bin | Rest], Acc) ->
    do_append(Rest, <<Acc/binary, Bin/binary>>).

do_iolist(BinList) ->
    do_iolist(BinList, []).

do_iolist([], Acc) ->
    iolist_to_binary(lists:reverse(Acc));
do_iolist([Bin | Rest], Acc) ->
    do_iolist(Rest, [Bin | Acc]).

essen commented 2 years ago

Please benchmark and see if it really helps. Note that the iolist_to_binary is going to be very expensive here.

zmstone commented 2 years ago

Please benchmark and see if it really helps. Note that the iolist_to_binary is going to be very expensive here.

I have updated the PR description with a simple test result.

essen commented 2 years ago

That shows it's a possible improvement, but you need to compare Gun itself with/without the patch. The main bottleneck here is likely to be the network, not the concatenation. It is also a good idea to compare memory usage and GC (garbage_collection and garbage_collection_info).

It might be better to make Gun send fewer messages instead or in addition to this patch.

You should also probably not be using await_body for large bodies if that's what you are doing.

zmstone commented 2 years ago

Sorry, my mistake. I went back to verify the scenario in which it was 10,000 times slower, turns out it’s not the concatenation to blame but bin_element in a case clause.

ninenines / gun

Optimize binary concatenation in gun:await_body #287