julien-duponchelle / python-mysql-replication

Pure Python Implementation of MySQL replication protocol build on top of PyMYSQL
2.31k stars 678 forks source link

Optimizing version check to happen once per run #597

Closed eli-kir closed 9 months ago

eli-kir commented 9 months ago

Description

When running the python profiler on this library, I noticed a pretty intense amount of calls and total time spent in Version.py. Upon investigation, I traced the call to a check that happened once per loop in fetch_one. Taking a clue from https://github.com/julien-duponchelle/python-mysql-replication/pull/363/ (which is more than a year old and sitting), I refactored what had initially been a @lru_cache change on my end to just be a static call once and then replaced all if trees that used it with the variable. I also changed as little as possible in an attempt to get this through. Below are the cProfiler stats before and after the change. Tested when parsing through a remote binary log, limited both to 3000 events from the stream reader just to save time, there is a nearly 50% speedup and the calls to Version.py is no longer dead at the top of the cProfiler for time spent.

Type of Change

Checklist

Tests

Additional Information

As stated above, there IS another PR that makes this change, but likely due to the fact that there were both multiple changes being thrown in and an unresponsive PR, I'm hoping this can make it through as it provides an immense speedup to performance of this library.

Before

$ python3 -m cProfile -s tottime parse_bin_log.py

         13139022 function calls (13114132 primitive calls) in 9.540 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   219826    1.270    0.000    2.836    0.000 version.py:186(__init__)
   109935    0.613    0.000    1.765    0.000 connections.py:730(_read_packet)
     3001    0.584    0.000    7.856    0.003 binlogstream.py:579(fetchone)
   109912    0.424    0.000    2.484    0.000 packet.py:60(__init__)
   220004    0.383    0.000    0.383    0.000 {method 'search' of 're.Pattern' objects}
   219826    0.363    0.000    0.396    0.000 version.py:503(_cmpkey)
    36819    0.352    0.000    0.352    0.000 {method 'recv_into' of '_socket.socket' objects}
  2198868    0.300    0.000    0.300    0.000 {method 'group' of 're.Match' objects}
   429139    0.280    0.000    0.316    0.000 protocol.py:62(read)
   319199    0.259    0.000    0.519    0.000 packet.py:133(read)
   219896    0.247    0.000    0.247    0.000 {method 'settimeout' of '_socket.socket' objects}
220847/220599    0.235    0.000    0.672    0.000 {method 'read' of '_io.BufferedReader' objects}
   219870    0.225    0.000    1.054    0.000 connections.py:775(_read_bytes)
    21986    0.206    0.000    1.611    0.000 row_event.py:740(__init__)
   769391    0.182    0.000    0.182    0.000 version.py:205(<genexpr>)
    55463    0.158    0.000    0.388    0.000 column.py:74(data)
   634077    0.150    0.000    0.223    0.000 column.py:76(<genexpr>)
   435833    0.135    0.000    0.135    0.000 {built-in method _struct.unpack}
1225328/1221520    0.127    0.000    0.128    0.000 {built-in method builtins.len}
    53928    0.100    0.000    0.408    0.000 row_event.py:163(__read_values_name)
      715    0.091    0.000    0.091    0.000 {built-in method marshal.loads}
    43975    0.090    0.000    0.118    0.000 event.py:16(__init__)
   659478    0.086    0.000    0.086    0.000 version.py:452(_parse_letter_version)
    21986    0.079    0.000    0.233    0.000 row_event.py:19(__init__)
   616407    0.079    0.000    0.079    0.000 {method 'startswith' of 'str' objects}
     2996    0.078    0.000    0.532    0.000 row_event.py:103(_read_column_data)
   109913    0.068    0.000    0.085    0.000 version.py:76(__lt__)
    85643    0.066    0.000    0.066    0.000 {method 'decode' of 'bytes' objects}
    21976    0.063    0.000    0.301    0.000 row_event.py:652(__init__)
    .
    .
    .

After

$ python3 -m cProfile -s tottime parse_bin_log_with_version_change.py

         7750856 function calls (7725829 primitive calls) in 5.872 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   109935    0.567    0.000    1.543    0.000 connections.py:730(_read_packet)
     3001    0.368    0.000    4.254    0.001 binlogstream.py:584(fetchone)
   109912    0.356    0.000    2.261    0.000 packet.py:60(__init__)
    36536    0.289    0.000    0.289    0.000 {method 'recv_into' of '_socket.socket' objects}
   429139    0.255    0.000    0.289    0.000 protocol.py:62(read)
   319199    0.240    0.000    0.485    0.000 packet.py:133(read)
   219892    0.220    0.000    0.220    0.000 {method 'settimeout' of '_socket.socket' objects}
   219870    0.201    0.000    0.897    0.000 connections.py:775(_read_bytes)
    21986    0.191    0.000    1.521    0.000 row_event.py:739(__init__)
   634077    0.142    0.000    0.212    0.000 column.py:76(<genexpr>)
    55463    0.128    0.000    0.347    0.000 column.py:74(data)
   435778    0.114    0.000    0.114    0.000 {built-in method _struct.unpack}
1225228/1221420    0.113    0.000    0.115    0.000 {built-in method builtins.len}
220846/220598    0.102    0.000    0.470    0.000 {method 'read' of '_io.BufferedReader' objects}
    53928    0.094    0.000    0.395    0.000 row_event.py:162(__read_values_name)
      714    0.081    0.000    0.081    0.000 {built-in method marshal.loads}
     2996    0.076    0.000    0.516    0.000 row_event.py:102(_read_column_data)
   616483    0.076    0.000    0.076    0.000 {method 'startswith' of 'str' objects}
    43975    0.074    0.000    0.097    0.000 event.py:16(__init__)
    21986    0.069    0.000    0.200    0.000 row_event.py:19(__init__)
    85641    0.065    0.000    0.065    0.000 {method 'decode' of 'bytes' objects}
    21976    0.055    0.000    0.259    0.000 row_event.py:651(__init__)
   .
   .
   .
sean-k1 commented 9 months ago

@eli-kir hello eli-kir Thanks for the good contribution I didn't realize it took so long to check the version.

eli-kir commented 9 months ago

Thanks! Happy to have helped, this library does a lot of heavy lifting for me