Closed lynxthecat closed 10 months ago
I seriously doubt this. It probably only shows that it eats a lot of CPU cycles because it is long lived and regularly called.
I think timeprofile is clever enough to actually list lines of bash code in the order they are called, together with the execution time associated with each. Since bash is single-threaded, doesn't this give an indication of the processing time associated with each line? See e.g. this excerpt form the compressed timeprofile linked above:
(0.001053): 147 -- 'printf '\''%s; %(%F-%H:%M:%S)T; %s; %s\n'\'' "${type}" -1 "${log_timestamp}" "${msg}"'
(0.000468): 2037 -- '(( enable_sleep_function ))'
(0.000305): 2038 -- '[[ ${load_condition[dl]} == *idle* && ${load_condition[ul]} == *idle* ]]'
(0.000283): 2039 -- '((t_sustained_connection_idle_us += (${EPOCHREALTIME/./}-t_end_us) ))'
(0.000234): 2040 -- '((t_sustained_connection_idle_us > sustained_idle_sleep_thr_us))'
(0.000256): 2082 -- 't_end_us="${EPOCHREALTIME/./}"'
(0.000206): 1946 -- 'true'
(0.000251): 1948 -- 'unset command'
(0.040455): 1949 -- 'read -r -u "${main_fd}" -a command'
(0.000525): 1951 -- '[[ -n "${command-}" ]]'
(0.000483): 1954 -- 'case "${command[0]}" in '
(0.001144): 1958 -- 'read -r timestamp reflector seq dl_owd_baseline_us dl_owd_us dl_owd_delta_ewma_us dl_owd_delta_us ul_owd_baseline_us(0.000440): 1989 -- 'case "${main_state}" in '
(0.000975): 1993 -- '[[ "${command[0]}" == "REFLECTOR_RESPONSE" && -n "${timestamp-}" && -n "${reflector-}" && -n "${seq-}" && -n "${dl_o(0.000464): 1996 -- 't_start_us=${EPOCHREALTIME/./}'
(0.000389): 1998 -- 'reflectors_last_timestamp_us="${timestamp//[.]}"'
(0.000417): 2000 -- '(( (t_start_us - 10#"${reflectors_last_timestamp_us}")>500000 ))'
(0.000326): 2007 -- '(( dl_delays[delays_idx] ))'
No it doesn't, especially for the read commands. They are by definition waiting for input so it's normal for it to take some time.
OK understood regarding read commands, but what about for the rest? And this read command:
read -r timestamp reflector seq dl_owd_baseline_us dl_owd_us dl_owd_delta_ewma_us dl_owd_delta_us ul_owd_baseline_us ul_owd_us ul_owd_delta_ewma_us ul_owd_delta_us <<< "${command[@]:1}"
is reading from an array, so that data is immediately available. I think I tried manually assigning the elements based on the values from the array, but I don't think that was any faster.
Something basic you could try is to define an IFS variable that only contains what you'd like as delimeters (in the case of cake-autorate, you want that to only contain a space). Simply defining an IFS variable at the start of the script will suffice.
I would expect that to be heavily optimized/it doesn't matter but maybe it does, IDK
Ah that's a super simple change (setting IFS to space). Would that avoid some unnecessary processing?
Of course, but as I said.. I expect it to be heavily optimized so it might not matter.
So just an assignment like IFS=" "
(or whatever you are using as the delimiter, if it's tab then IFS=$"\t"
.. etc) at the start would work OK
Here is just one 60s run for each using `time timeout -s INT 60 ./cake-autorate.sh.
without IFS=" "
:
real 1m 1.12s
user 0m 9.30s
sys 0m 1.75s
with IFS=" "
:
real 1m 1.12s
user 0m 9.45s
sys 0m 1.60s
Somehow it's worse! :D
Presumably just random variation? Would you still recommend including IFS=" "
? If so, where would I put it? In cake-autorate.sh at the top somewhere (not the .lib)?
@lynxthecat If you want to put it I'd recommend cake-autorate.sh, it doesn't seem like it hurts and would help keep delimiting consistent at least.
Seeing you control the input, it seems like read -r
incurs a performance penalty which is observed in 10k,100k,1m runs. See:
user@ubuntu:~$ time for ((i=0;i<1000000;i++)); do read l <<<l; done
real 0m5.880s
user 0m3.115s
sys 0m2.763s
user@ubuntu:~$ time for ((i=0;i<1000000;i++)); do read -r l <<<l; done
real 0m6.021s
user 0m3.204s
sys 0m2.816s
I don't think it's bad to stick to the variant without -r
for now, should not be an issue for us really..
user@ubuntu:~$ time for ((i=0;i<1000000;i++)); do read -r l <<<l; done
@lynxthecat This increases by 700ms if you assign IFS to a space, not worth it IMO
Seeing you control the input, it seems like
read -r
incurs a performance penalty which is observed in 10k,100k,1m runs. See:
Bash is seriously unpredictable, I think I added it at the time because I learned it from shellcheck a long time ago and I know it actually causes bash to do a lot less (no escaping of input) so it doesn't make sense to me that doing less = doing more in bash :')
Seeing you control the input, it seems like
read -r
incurs a performance penalty which is observed in 10k,100k,1m runs. See:user@ubuntu:~$ time for ((i=0;i<1000000;i++)); do read l <<<l; done real 0m5.880s user 0m3.115s sys 0m2.763s user@ubuntu:~$ time for ((i=0;i<1000000;i++)); do read -r l <<<l; done real 0m6.021s user 0m3.204s sys 0m2.816s
I don't think it's bad to stick to the variant without
-r
for now, should not be an issue for us really..
Of course this is on an i7-11800H, it's probably massively worse on a crappy MIPS processor
Wait so what's the conclusion here? I set IFS=" " with this commit: https://github.com/lynxthecat/cake-autorate/commit/eb94cc252ec220b0a9edbe1db6e7cb9ed31fda81. Keep or discard?
Disregard IFS and do not use read -r
@lynxthecat actually disregard my comment about IFS, the reason it seemed slower to me is because I did IFS=" " read ...
which is slower than assigning it once and then using read.
root@OpenWrt-1:/# a="DATA; 2023-06-06-23:37:38; 1686091058.437920; 1686091058.436087; 11; 15; 0; 0; 1686091058.417140; 9.9.9.10; 4674; 15706; 20350; 2491; 4644; 40600; 15706; 20350; 2491; 4644; 40600; 0; 0; dl_idle; ul_idle; 20000; 20000"
root@OpenWrt-1:/# time for((i=80000;i--;)); do read -r -a aa <<< "${a}"; done
real 0m34.624s
user 0m19.438s
sys 0m13.696s
root@OpenWrt-1:/# IFS=" "
root@OpenWrt-1:/# time for((i=80000;i--;)); do read -r -a aa <<< "${a}"; done
real 0m33.728s
user 0m19.128s
sys 0m14.037s
root@OpenWrt-1:/# time for((i=80000;i--;)); do read -a aa <<< "${a}"; done
real 0m34.053s
user 0m18.961s
sys 0m13.976s
root@OpenWrt-1:/# unset IFS
root@OpenWrt-1:/# time for((i=80000;i--;)); do read -a aa <<< "${a}"; done
real 0m33.860s
user 0m19.595s
sys 0m13.419s
root@OpenWrt-1:/# printf -v b "%-256s" "${a}"
root@OpenWrt-1:/# time for((i=80000;i--;)); do read -N 256 bb <<< "${b}"; bb=(${bb}); done
real 0m18.417s
user 0m15.547s
sys 0m2.167s
So IFS=" " at least does not hurt.
The big gain for large messages is moving to fixed length reads, but I've been resisting that so far.
For small messages the same fixed length actually hurts:
root@OpenWrt-1:/# a="hello bye hello"
root@OpenWrt-1:/# time for((i=80000;i--;)); do read -a aa <<< "${a}"; done
real 0m8.886s
user 0m6.076s
sys 0m2.493s
root@OpenWrt-1:/# printf -v b "%-256s" "${a}"
root@OpenWrt-1:/# time for((i=80000;i--;)); do read -N 256 bb <<< "${b}"; bb=(${bb}); done
real 0m18.425s
user 0m15.615s
sys 0m2.165s
Maybe it's because you used -a
?
I think it's better to do these tests on more powerful machines just so that you could have more iterations than just 80k
I tried @jkool702's nifty timeprofile tool to profile cake-autorate.
I encountered a few issues in doing so - see here https://github.com/jkool702/timeprofile/issues/1, but nevertheless was able to generate this timeprofile file: https://easyupload.io/1tla7w.
@jkool702 does the timeprofile I generated reveal to you any particular aspects of cake-autorate that would benefit from optimisation?
From scrolling through the lines of the timeprofile it looks to me like this line:
https://github.com/lynxthecat/cake-autorate/blob/15de9aa4e0cb8dd7f6f5efadcbc227076f96fb25/cake-autorate.sh#L1953
eats up a significant number of CPU cycles.
Can you see a way to speed that up without switching over to fixed byte reads using
read -N
?I know that as we discussed on reddit switching over to fixed-length reads might help, but I'm a bit reluctant to go that route for now. In general deciding on one length for all messages is a little awkward. Whilst some of the messages passed from one process to another are lengthy, there are also much smaller messages. And it is difficult to decide upon an upper limit, albeit perhaps the message length might be a user-defined variable and cake-autorate could error out on any truncation associated with, say, the use of large domain names for reflectors.