julien-duponchelle / python-mysql-replication

Pure Python Implementation of MySQL replication protocol build on top of PyMYSQL
2.32k stars 679 forks source link

BinLogStreamReader seems to have performance problems #251

Open lingniao2000 opened 6 years ago

lingniao2000 commented 6 years ago

The last two months, we have been debugging pg_chameleon. It uses python-mysql-replication, the current problems encountered is python-mysql-replication occupy 100% CPU, network recv-Q queue backlog. We through the line-profiler Observe that the two most important statements take the most time

     for binlogevent in my_stream:      for row in binlogevent.rows:

Please help us, we have no other idea. Details are as follows(with cut down):

2018-02-11 11:50:22 read_replica DEBUG mysql_lib.py (1087): Master coordinates: {'File': 'mysql-bin.0
Timer unit: 1e-06 s                                                                                  

Total time: 628.137 s                                                                                
File: /root/chameleon/lib/python3.5/site-packages/pg_chameleon/lib/mysql_lib.py                      
Function: __read_replica_stream at line 881                                                          

Line #      Hits         Time  Per Hit   % Time  Line Contents                                       
==============================================================                                       
   881                                              def __read_replica_stream(self, batch_data):       
   911         1      62448.0  62448.0      0.0         sql_tokeniser = sql_token()                      
   912         1    1866884.0 1866884.0      0.3        table_type_map = self.get_table_type_map()     
   933         1        420.0    420.0      0.0         self.logger.debug("log_file %s, log_position %s. 
   935    225048  210244314.0    934.2     33.5         for binlogevent in my_stream:                    
   936    225048     870337.0      3.9      0.1             if isinstance(binlogevent, RotateEvent):       
  1008    111908     327791.0      2.9      0.1                 if close_batch:                              
  1009         1       6171.0   6171.0      0.0                     my_stream.close()                          
  1010         1         10.0     10.0      0.0                     return [master_data, close_batch]          
  1011                                                      else:                                          
  1013    226441  301239216.0   1330.3     48.0                 for row in binlogevent.rows:                 
  1014    113302     405720.0      3.6      0.1                     event_after={}                             
  1015    113302     312578.0      2.8      0.0                     event_before={}                            
  1084                                                                                                           
  1085    100000     348828.0      3.5      0.1                         if len(group_insert)>=self.replica_batch_
  1086         1        372.0    372.0      0.0                             self.logger.info("Max rows per batch re
  1087         1        227.0    227.0      0.0                             self.logger.debug("Master coordinates: 
  1088         1   39953572.0 39953572.0      6.4                           self.pg_engine.write_batch(group_inse
  1089         1          8.0      8.0      0.0                             size_insert=0                          

(chameleon)[root@QH-PRD-STGE-MYSQL-MHA08 pg_chameleon]# strace -c -p 52306           
strace: Process 52306 attached                                                       
^Cstrace: Process 52306 detached                                                     
% time     seconds  usecs/call     calls    errors syscall                           
------ ----------- ----------- --------- --------- ----------------                  
 92.48    0.055257           0   1087063           ioctl                             
  7.33    0.004382           0     10156           recvfrom                          
  0.19    0.000112           0       442           stat                              
  0.00    0.000000           0         1           sendto                            
------ ----------- ----------- --------- --------- ----------------                  
100.00    0.059751               1097662           total                             

[root@QH-PRD-STGE-MYSQL-MHA08 chameleon]# netstat -natlp
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1200/sshd           
tcp        0      0 0.0.0.0:5432            0.0.0.0:*               LISTEN      47875/postmaster    
tcp        0      0 127.0.0.1:25            0.0.0.0:*               LISTEN      2086/master         
tcp        0      0 172.20.20.27:5432       192.168.46.11:64869     ESTABLISHED 52356/postgres: pos 
tcp        0      0 172.20.20.27:5432       192.168.46.11:64824     ESTABLISHED 52278/postgres: pos 
tcp        0      0 172.30.0.27:46607       172.30.0.13:3306        ESTABLISHED 52306/python3.5     
tcp        0      0 172.20.20.27:41121      172.16.30.204:3306      ESTABLISHED 24904/mysqld        
tcp        0      0 172.30.0.27:54365       172.30.0.27:5432        ESTABLISHED 52306/python3.5     
tcp        0    240 172.20.20.27:22         192.168.46.11:64763     ESTABLISHED 52208/sshd: root@pt 
tcp        0      0 172.30.0.27:5432        172.30.0.27:54363       ESTABLISHED 52308/postgres: pos 
tcp        0      0 172.20.20.27:5432       192.168.46.11:64823     ESTABLISHED 52276/postgres: pos 
tcp        0      0 172.30.0.27:5432        172.30.0.27:54365       ESTABLISHED 52309/postgres: pos 
tcp   223072      0 172.30.0.27:46605       172.30.0.13:3306        ESTABLISHED 52306/python3.5

pg_chameleon issue https://github.com/the4thdoctor/pg_chameleon/issues/62

julien-duponchelle commented 6 years ago

I know that pypy offer better speed performance

Le lun. 12 févr. 2018 à 07:42, lingniao2000 notifications@github.com a écrit :

The last two months, we have been debugging pg_chameleon. It uses python-mysql-replication, the current problems encountered is python-mysql-replication occupy 100% CPU, network recv-Q queue backlog. We through the line-profiler Observe that the two most important statements take the most time

 for binlogevent in my_stream:
 for row in binlogevent.rows:

Please help us, we have no other way. Details are as follows(with cut down):

2018-02-11 11:50:22 read_replica DEBUG mysql_lib.py (1087): Master coordinates: {'File': 'mysql-bin.0 Timer unit: 1e-06 s

Total time: 628.137 s File: /root/chameleon/lib/python3.5/site-packages/pg_chameleon/lib/mysql_lib.py Function: __read_replica_stream at line 881

Line # Hits Time Per Hit % Time Line Contents

881 def __read_replica_stream(self, batch_data): 911 1 62448.0 62448.0 0.0 sql_tokeniser = sql_token() 912 1 1866884.0 1866884.0 0.3 table_type_map = self.get_table_type_map()
933 1 420.0 420.0 0.0 self.logger.debug("log_file %s, log_position %s. 935 225048 210244314.0 934.2 33.5 for binlogevent in my_stream: 936 225048 870337.0 3.9 0.1 if isinstance(binlogevent, RotateEvent): 1008 111908 327791.0 2.9 0.1 if close_batch: 1009 1 6171.0 6171.0 0.0 my_stream.close() 1010 1 10.0 10.0 0.0 return [master_data, close_batch] 1011 else: 1013 226441 301239216.0 1330.3 48.0 for row in binlogevent.rows: 1014 113302 405720.0 3.6 0.1 event_after={} 1015 113302 312578.0 2.8 0.0 event_before={} 1084
1085 100000 348828.0 3.5 0.1 if len(group_insert)>=self.replicabatch 1086 1 372.0 372.0 0.0 self.logger.info("Max rows per batch re 1087 1 227.0 227.0 0.0 self.logger.debug("Master coordinates: 1088 1 39953572.0 39953572.0 6.4 self.pg_engine.write_batch(group_inse 1089 1 8.0 8.0 0.0 size_insert=0

(chameleon)[root@QH-PRD-STGE-MYSQL-MHA08 pg_chameleon]# strace -c -p 52306 strace: Process 52306 attached ^Cstrace: Process 52306 detached % time seconds usecs/call calls errors syscall

92.48 0.055257 0 1087063 ioctl 7.33 0.004382 0 10156 recvfrom 0.19 0.000112 0 442 stat 0.00 0.000000 0 1 sendto

100.00 0.059751 1097662 total

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/noplay/python-mysql-replication/issues/251, or mute the thread https://github.com/notifications/unsubscribe-auth/AAVFXRJJmH0pFLJoksIiYYKrodtQyQMHks5tT91BgaJpZM4SBwGB .

the4thdoctor commented 6 years ago

hi, just to add more informations. as I'm using psycopg2 the pypy could be a problem. looks like there is a compatibility layer but I've never tried it. https://pypi.python.org/pypi/psycopg2cffi

Thanks

lingniao2000 commented 6 years ago

deploy pypy3 , test and found there are indeed have compatibility issues

(venv_chameleon) [root@MYSQL-SQLCHECK ~]# chameleon.py init_replica --source prod --config default --debug
2018-02-13 10:50:22 MainProcess DEBUG mysql_lib.py (1147): starting init replica for source prod
2018-02-13 10:50:22 MainProcess DEBUG pg_lib.py (615): Changing the autocommit flag to True
2018-02-13 10:50:22 MainProcess DEBUG pg_lib.py (2677): Collecting schema mappings for source prod
2018-02-13 10:50:22 MainProcess INFO pg_lib.py (1534): deleting all the table references from the replica catalog for source prod 
2018-02-13 10:50:23 MainProcess DEBUG mysql_lib.py (233): Creating the schema _db_nono_tmp.
Traceback (most recent call last):
  File "/root/venv_chameleon/bin/chameleon.py", line 54, in <module>
    getattr(replica, args.command)()
  File "/root/venv_chameleon/site-packages/pg_chameleon/lib/global_lib.py", line 316, in init_replica
    self.__init_mysql_replica()
  File "/root/venv_chameleon/site-packages/pg_chameleon/lib/global_lib.py", line 326, in __init_mysql_replica
    self.mysql_source.init_replica()
  File "/root/venv_chameleon/site-packages/pg_chameleon/lib/mysql_lib.py", line 1156, in init_replica
    self.create_destination_schemas()
  File "/root/venv_chameleon/site-packages/pg_chameleon/lib/mysql_lib.py", line 234, in create_destination_schemas
    self.pg_engine.create_database_schema(loading_schema)
  File "/root/venv_chameleon/site-packages/pg_chameleon/lib/pg_lib.py", line 3294, in create_database_schema
    self.pgsql_cur.execute(sql_create)
  File "/root/venv_chameleon/site-packages/psycopg2cffi/_impl/cursor.py", line 30, in check_closed_
    return func(self, *args, **kwargs)
  File "/root/venv_chameleon/site-packages/psycopg2cffi/_impl/cursor.py", line 263, in execute
    self._pq_execute(self._query, conn._async)
  File "/root/venv_chameleon/site-packages/psycopg2cffi/_impl/cursor.py", line 690, in _pq_execute
    pgconn, util.ascii_to_bytes(query))
  File "/root/venv_chameleon/site-packages/psycopg2cffi/_impl/adapters.py", line 306, in ascii_to_bytes
    assert isinstance(s, six.binary_type)
AssertionError