universe-proton / universe-topology

A universal computer knowledge topology for all the programmers worldwide.
Apache License 2.0
50 stars 0 forks source link

The python statement level performance analysis. #18

Open justdoit0823 opened 5 years ago

justdoit0823 commented 5 years ago

Statement level performance

Within the high level programming language, python is very flexible and easy to write. In some situations, there may be different kinds of ways to do the same thing, and the performance are variant with resepect to the ways. How can we measure the performance and choose the right way?

In the following, I will detail this in two steps. First, I will do some benchmarks. Second, do some analysis about the benchmark results.

Benchmark

Benchmark tool

In python, we can easily use timeit to do statement level performance analysis, which is a builtin module in the CPython core library.

Benchmark function

def empty():
    pass
def load_str():
    str
def load_str_local(str=str):
    str
def f_str(v):
    f'{v}'
def convert_str(v):
    str(v)
def convert_str_local(v, str=str):
    str(v)
def py_str(v):
    return str(v)

def convert_py_str_local(v, str=py_str):
    str(v)

Benchmark result

88 ns ± 1.36 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
113 ns ± 0.373 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
99.3 ns ± 4.2 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
110 ns ± 7.85 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
251 ns ± 7.87 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
213 ns ± 6.29 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
346 ns ± 11.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

I run this benchmarks with Python3.7.1 on my local macOS machine. The duration depends on the software version and machine speed. So it's ordinary to see different results. Now I can give some conclusions as follow,

Moreover, I could step into some details.

Inspection tool

With the above benchmark result, we have had some duration numbers at hands. But we still can't figure out what happened in the above situation. Fortunately, there is another useful tool dis, which is a disassemble module and can tell us what the python code will do at the virtual machine level. With dis module, we can step futher. Let's do it.

code result

  2          0 LOAD_CONST               0 (None)
              2 RETURN_VALUE
  2          0 LOAD_GLOBAL              0 (str)
              2 POP_TOP
              4 LOAD_CONST               0 (None)
              6 RETURN_VALUE
  2          0 LOAD_FAST                0 (str)
              2 POP_TOP
              4 LOAD_CONST               0 (None)
              6 RETURN_VALUE
  2          0 LOAD_FAST                0 (v)
              2 FORMAT_VALUE             0
              4 POP_TOP
              6 LOAD_CONST               0 (None)
              8 RETURN_VALUE
  2          0 LOAD_GLOBAL              0 (str)
              2 LOAD_FAST                0 (v)
              4 CALL_FUNCTION            1
              6 POP_TOP
              8 LOAD_CONST               0 (None)
             10 RETURN_VALUE
  5          0 LOAD_FAST                1 (str)
              2 LOAD_FAST                0 (v)
              4 CALL_FUNCTION            1
              6 POP_TOP
              8 LOAD_CONST               0 (None)
             10 RETURN_VALUE
  5          0 LOAD_FAST                1 (str)
              2 LOAD_FAST                0 (v)
              4 CALL_FUNCTION            1
              6 POP_TOP
              8 LOAD_CONST               0 (None)
             10 RETURN_VALUE

Here I won't detail more about the disassembled result, rather to compare the corresponding bytecodes and get some intuitions.

Conclusion

From the above experience, it's easy and obvious to use timeit and dis to measure statement level performance. The following is the summarization,

overhead

tool

Reference