Within the high level programming language, python is very flexible and easy to write. In some situations, there may be different kinds of ways to do the same thing, and the performance are variant with resepect to the ways. How can we measure the performance and choose the right way?
In the following, I will detail this in two steps. First, I will do some benchmarks. Second, do some analysis about the benchmark results.
Benchmark
Benchmark tool
In python, we can easily use timeit to do statement level performance analysis, which is a builtin module in the CPython core library.
88 ns ± 1.36 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
load global attribute
113 ns ± 0.373 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
load local attribute
99.3 ns ± 4.2 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
builtin format string
110 ns ± 7.85 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
call builtin str function
251 ns ± 7.87 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
call builtin str function from local
213 ns ± 6.29 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
call python str function from local
346 ns ± 11.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
I run this benchmarks with Python3.7.1 on my local macOS machine. The duration depends on the software version and machine speed. So it's ordinary to see different results. Now I can give some conclusions as follow,
calling empty function still has overhead
local attribute retrive is faster than the global
sometimes instruction level operation is faster.
stack frame has much overhead.
Moreover, I could step into some details.
Inspection tool
With the above benchmark result, we have had some duration numbers at hands. But we still can't figure out what happened in the above situation. Fortunately, there is another useful tool dis, which is a disassemble module and can tell us what the python code will do at the virtual machine level. With dis module, we can step futher. Let's do it.
Statement level performance
Within the high level programming language, python is very flexible and easy to write. In some situations, there may be different kinds of ways to do the same thing, and the performance are variant with resepect to the ways. How can we measure the performance and choose the right way?
In the following, I will detail this in two steps. First, I will do some benchmarks. Second, do some analysis about the benchmark results.
Benchmark
Benchmark tool
In python, we can easily use
timeit
to do statement level performance analysis, which is a builtin module in the CPython core library.Benchmark function
Benchmark result
I run this benchmarks with Python3.7.1 on my local macOS machine. The duration depends on the software version and machine speed. So it's ordinary to see different results. Now I can give some conclusions as follow,
calling empty function still has overhead
local attribute retrive is faster than the global
sometimes instruction level operation is faster.
stack frame has much overhead.
Moreover, I could step into some details.
Inspection tool
With the above benchmark result, we have had some duration numbers at hands. But we still can't figure out what happened in the above situation. Fortunately, there is another useful tool
dis
, which is a disassemble module and can tell us what the python code will do at the virtual machine level. Withdis
module, we can step futher. Let's do it.code result
Here I won't detail more about the disassembled result, rather to compare the corresponding bytecodes and get some intuitions.
LOAD_FAST is faster than LOAD_GLOBAL
FORMAT_VALUE does greate job
CALL_FUNCTION suffers the stack frame overhead.
Conclusion
From the above experience, it's easy and obvious to use
timeit
anddis
to measure statement level performance. The following is the summarization,overhead
access global attribute repeatedly
call python function overhead
execution stack frame overhead
tool
timeit
dis
Reference
timeit
dis