Open ARMBouhali opened 1 year ago
Interesting. I agree that auto-completion is a tad slow. I'm curious to see what we will discover when we profile Tutor. (profiling is always fun!)
Hey @ARMBouhali , do you have any interest in working on this?
I decided to give another stab at this issue. I'll write down my notes here.
First of all, it's quite easy to run tutor in "completion mode". For instance, to simulate the completion of tutor config printvalue DOCK<tab
, we should run:
COMP_WORDS='tutor config printvalue DOCK' COMP_CWORD=3 _TUTOR_COMPLETE=bash_complete tutor
(I found this by running bash with the set -x
option)
From that point it's relatively easy to profile shell completion. Create a prof.py
file:
import sys
sys.argv = ["tutor"]
import tutor.commands.cli; tutor.commands.cli.main()
Then run this script:
COMP_WORDS='tutor config printvalue DOCK' COMP_CWORD=3 _TUTOR_COMPLETE=bash_complete python -mcProfile -s cumulative ./prof.py > autocomplete.prof
Here are the first items > 0.1s cumulative time. I highlighted a couple of suspicious items:
1197590 function calls (1077470 primitive calls) in 0.767 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
519/1 0.002 0.000 0.769 0.769 {built-in method builtins.exec}
1 0.000 0.000 0.769 0.769 prof.py:1(<module>)
1 0.000 0.000 0.489 0.489 cli.py:21(main)
1 0.000 0.000 0.447 0.447 core.py:1155(__call__)
1 0.000 0.000 0.447 0.447 core.py:1010(main)
1 0.000 0.000 0.447 0.447 core.py:1123(_main_shell_completion)
1 0.000 0.000 0.447 0.447 shell_completion.py:17(shell_complete)
1 0.000 0.000 0.446 0.446 shell_completion.py:285(complete)
1 0.000 0.000 0.446 0.446 shell_completion.py:263(get_completions)
2 0.000 0.000 0.367 0.184 config.py:191(render_full) <-------------------------- why 2 calls?
362/282 0.000 0.000 0.367 0.001 env.py:378(render_unknown)
288 0.000 0.000 0.366 0.001 env.py:393(render_str) <-------------------------- can we make this any faster? Maybe with a single Renderer instance?
264/4 0.002 0.000 0.286 0.071 <frozen importlib._bootstrap>:1349(_find_and_load)
257/4 0.001 0.000 0.286 0.071 <frozen importlib._bootstrap>:1304(_find_and_load_unlocked)
577/11 0.001 0.000 0.285 0.026 <frozen importlib._bootstrap>:480(_call_with_frames_removed)
253/8 0.001 0.000 0.284 0.036 <frozen importlib._bootstrap>:911(_load_unlocked)
228/8 0.001 0.000 0.284 0.035 <frozen importlib._bootstrap_external>:989(exec_module)
1 0.000 0.000 0.284 0.284 core.py:2425(shell_complete)
1 0.000 0.000 0.284 0.284 config.py:30(shell_complete)
1 0.000 0.000 0.284 0.284 config.py:38(_shell_complete_config_items)
183 0.000 0.000 0.284 0.002 config.py:47(_candidate_config_items)
1 0.000 0.000 0.284 0.284 params.py:23(config)
1 0.000 0.000 0.284 0.284 config.py:52(load_full)
1 0.000 0.000 0.279 0.279 cli.py:1(<module>)
58/13 0.000 0.000 0.262 0.020 {built-in method builtins.__import__}
174/53 0.000 0.000 0.261 0.005 <frozen importlib._bootstrap>:1390(_handle_fromlist)
6/3 0.000 0.000 0.204 0.068 actions.py:99(do)
6/3 0.000 0.000 0.203 0.068 actions.py:118(do_from_context)
40/18 0.000 0.000 0.203 0.011 actions.py:31(do)
288 0.003 0.000 0.198 0.001 env.py:100(__init__) <-------------------------- this is the Renderer contructor called 288 times
104004/289 0.092 0.000 0.181 0.001 copy.py:118(deepcopy)
2023/289 0.028 0.000 0.179 0.001 copy.py:217(_deepcopy_dict)
2 0.000 0.000 0.172 0.086 config.py:1(<module>)
4 0.000 0.000 0.171 0.043 utils.py:1(<module>)
288 0.000 0.000 0.168 0.001 env.py:182(render_str)
1 0.000 0.000 0.162 0.162 shell_completion.py:495(_resolve_context)
2 0.000 0.000 0.161 0.080 core.py:1722(resolve_command)
1 0.000 0.000 0.161 0.161 cli.py:46(get_command)
1 0.000 0.000 0.161 0.161 cli.py:64(ensure_plugins_enabled)
1 0.000 0.000 0.161 0.161 config.py:313(_enable_plugins)
34 0.001 0.000 0.156 0.005 __init__.py:1(<module>)
1 0.000 0.000 0.154 0.154 config.py:40(load_minimal)
288 0.001 0.000 0.141 0.000 environment.py:1089(from_string)
288 0.001 0.000 0.136 0.000 environment.py:728(compile)
We should definitely investigate why render_full
is called twice, and how to optimize the render_str
function.
Background
I've been using tutor for my deployments since Open edX's maple release, and over 6 months as a development environment. One of the yet-to-be-solved DevExp issues I've faced is the slowness of the shell autocomplete feature.
Here is my experience with testing a basic
tutor [tab][tab]
in bash.late (:cry:) not-so-strongubuntu machine [4G CPU / 16GB DDR3L / SSD]: response time was over 3s before the available commands are completed.[tab]
have around 0.4s response time (which is quite acceptable but not applicable to less powerful machines).Tasks
A suggested roadmap:
pip install tutor
) vs tutor installed in editable mode (pip install -e git+<tutor-gh-repo>
)Notes
FYI @regisb