julien-duponchelle / python-mysql-replication

Pure Python Implementation of MySQL replication protocol build on top of PyMYSQL
2.31k stars 678 forks source link

Assertion Error #610

Open MintooJinnu opened 7 months ago

MintooJinnu commented 7 months ago

Version

Please specify the versions you are using. Exact version numbers are preferred.

Symptoms

I have a prod db, where somw insert queries are running on top of it, I am trying to replicate in snowflake. But while reading the insert event, while getting binlog_event.event.rows throwing

Assertion Error Expected = 111. Actual = 46. Position=130. Data=176

And I am getting the assertion error for column type varchar,

Steps to Reproduce

I cannot help you in reproducing as I am testing in prod environment

I tried with many version degrading and upgrading with all the modules such as pymysql, mysql-replication, s3, etc..

sean-k1 commented 7 months ago

What is your Pymyrepli version? @MintooJinnu

MintooJinnu commented 7 months ago

I tried with multiple mysql-replication @sean-k1 , tried with 0.24, 0.25, 0.26 and 1.0.0, 0.1.0, 0.45.0, 1.0.6 and 1.0.5

sean-k1 commented 7 months ago

@MintooJinnu If you can't tell me which rows are causing the problem, there's nothing I can do to help you.

MintooJinnu commented 7 months ago

Sure let me get you the rows @sean-k1

MintooJinnu commented 7 months ago

I have a huge insert query with multiple sub queries, Query 1 is just an insert and Query 2 is a select query

Insert into table_1 columns(.....) SELECT......

Query2 fetching records:

A B C D E F G H I J K L M N O P Q R S T U V
2020-02-12 12:00:00 10010 1 1 1 uidjjlsnckepcbieb eiuhfief cb23yyd vfugcjecfrifwh-cninwcbjvrdeciw 1234567 2020-02-12 12:00:00 NULL NULL NULL 123456789951753852456321654987 4njkfneodnkemfkekdmownfkfrnfirwlo A1.2.7849383874.273888393 192.168.0.0.1 Nokia Windows 10/Server 2020 Google Chrome false NULL 0

Also attached the excel sheet for your reference sample_data.xlsx

The Schema of table_1: A DATETIME(3), PRIMARY KEY, int(11) UN PK B INT(11) UN PK C INT(11) PK D INT(11) UN PK E INT(11) UN F VARCHAR(250) G VARCHAR(100) PK H BIGINT(20) I DATETIME(3) J VARCHAR(255) K VARCHAR(50) L VARCHAR(9) M VARCHAR(39) N VARCHAR(32) O VARCHAR(28) P VARCHAR(20) Q VARCHAR(50) R VARCHAR(50) S VARCHAR(50) T VARCHAR(11) U VARCHAR(6) V TINYINT(1)

This is all happening in Maria DB @sean-k1

MintooJinnu commented 7 months ago

Adding an other piece of information, which might be helpful to you is the query2(subquery) is having PARTITION BY also, please let me know if that might create some problem in reading string in pascal method @sean-k1

MintooJinnu commented 6 months ago

I observed one more thing, ie. It is reading the other column data suddenly after reading data of column F @sean-k1. In column F receiving all the data of other columns. This is happening only to this table.

sean-k1 commented 6 months ago
mysql> CREATE TABLE YourTableName (
    ->     A DATETIME(3) PRIMARY KEY,
    ->     B INT(11) UNSIGNED NOT NULL,
    ->     C INT(11) NOT NULL,
    ->     D INT(11) UNSIGNED NOT NULL,
    ->     E INT(11) UNSIGNED,
    ->     F VARCHAR(250),
    ->     G VARCHAR(100) NOT NULL,
    ->     H BIGINT(20),
    ->     I DATETIME(3),
    ->     J VARCHAR(255),
    ->     K VARCHAR(50),
    ->     L VARCHAR(9),
    ->     M VARCHAR(39),
    ->     N VARCHAR(32),
    ->     O VARCHAR(28),
    ->     P VARCHAR(20),
    ->     Q VARCHAR(50),
    ->     R VARCHAR(50),
    ->     S VARCHAR(50),
    ->     T VARCHAR(11),
    ->     U VARCHAR(6),
    ->     V TINYINT(1)
    -> );
Query OK, 0 rows affected, 6 warnings (0.01 sec)

mysql> INSERT INTO YourTableName (A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V)
    -> VALUES (
    ->     '2020-02-12 12:00:00', -- A
    ->     10010, -- B
    ->     1, -- C
    ->     1, -- D
    ->     1, -- E
    ->     'uidjjlsnckepcbieb eiuhfief cb23yyd', -- F
    ->     'vfugcjecfrifwh-cninwcbjvrdeciw', -- G
    ->     1234567, -- H
    ->     '2020-02-12 12:00:00', -- I
    ->     NULL, -- J
    ->     NULL, -- K
    ->     NULL, -- L
    ->     '123456789951753852456321654987', -- M
    ->     '4njkfneodnkemfkekdmownfkfrnfirwo', -- N
    ->     'A1.2.7849383874.273888393', -- O
    ->     '192.168.0.0.1', -- P
    ->     'Nokia', -- Q
    ->     'Windows 10/Server 2020', -- R
    ->     'Google Chrome', -- S
    ->     false, -- T
    ->     NULL, -- U
    ->     0 -- V
    -> );
Query OK, 1 row affected (0.01 sec)

your Data '4njkfneodnkemfkekdmownfkfrnfirwo', -- N (33) is larger than 32 so i erase last word Is it right your table and insert query? @MintooJinnu

sean-k1 commented 6 months ago

i did not find error in mysql 8.0 enviornment @MintooJinnu


=== QueryEvent ===
Date: 2024-02-19T10:08:01
Log position: 2322
Event size: 570
Read bytes: 570
Schema: b'test'
Execution time: 0
Query: CREATE TABLE YourTableName (
    A DATETIME(3) PRIMARY KEY,
    B INT(11) UNSIGNED NOT NULL,
    C INT(11) NOT NULL,
    D INT(11) UNSIGNED NOT NULL,
    E INT(11) UNSIGNED,
    F VARCHAR(250),
    G VARCHAR(100) NOT NULL,
    H BIGINT(20),
    I DATETIME(3),
    J VARCHAR(255),
    K VARCHAR(50),
    L VARCHAR(9),
    M VARCHAR(39),
    N VARCHAR(32),
    O VARCHAR(28),
    P VARCHAR(20),
    Q VARCHAR(50),
    R VARCHAR(50),
    S VARCHAR(50),
    T VARCHAR(11),
    U VARCHAR(6),
    V TINYINT(1)
)

=== QueryEvent ===
Date: 2024-02-19T10:08:35
Log position: 2484
Event size: 60
Read bytes: 60
Schema: b'test'
Execution time: 0
Query: BEGIN

=== TableMapEvent ===
Date: 2024-02-19T10:08:35
Log position: 2655
Event size: 148
Read bytes: 148
Table id: 104
Schema: test
Table: yourtablename
Columns: 22
=== OptionalMetaData ===
unsigned_column_list: [False, True, False, True, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False]
default_charset_collation: 255
charset_collation: {}
column_charset: []
column_name_list: ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V']
set_str_value_list : []
set_enum_str_value_list : []
geometry_type_list : []
simple_primary_key_list: [0]
primary_keys_with_prefix: {}
visibility_list: [True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True]
charset_collation_list: [255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255]
enum_and_set_collation_list: []

=== WriteRowsEvent ===
Date: 2024-02-19T10:08:35
Log position: 2951
Event size: 273
Read bytes: 14
Table: test.yourtablename
Affected columns: 22
Changed rows: 1
Column Name Information Flag: True
Values:
--
* A : 2020-02-12 12:00:00
* B : 10010
* C : 1
* D : 1
* E : 1
* F : uidjjlsnckepcbieb eiuhfief cb23yyd
* G : vfugcjecfrifwh-cninwcbjvrdeciw
* H : 1234567
* I : 2020-02-12 12:00:00
* J : None (null)
* K : None (null)
* L : None (null)
* M : 123456789951753852456321654987
* N : 4njkfneodnkemfkekdmownfkfrnfirwo
* O : A1.2.7849383874.273888393
* P : 192.168.0.0.1
* Q : Nokia
* R : Windows 10/Server 2020
* S : Google Chrome
* T : 0
* U : None (null)
* V : 0

=== XidEvent ===
Date: 2024-02-19T10:08:35
Log position: 2982
Event size: 8
Read bytes: 8
Transaction ID: 40
MintooJinnu commented 6 months ago

There is some problem with the data @sean-k1 because I was running very huge query, and that is not reading the correct data, unpacking is the problem actually. Let me send you the exact query by morphing it.

MintooJinnu commented 6 months ago

value[name] = self.__read_string(1, column) it is throwing a value which is not in perfect way. Basically we are reading a session id getting the value as 'ud7ehuwkhdhiejduudjjjej.hsijd.jsjsj.\x1c5468-a♤Á\x19\x00.....^P93F0

Please help me on this @sean-k1

sean-k1 commented 6 months ago

@MintooJinnu Please Write the problematic SQL statement query

MintooJinnu commented 6 months ago

I'm able to fix the error but I need a small help @sean-k1 I have a table with column DDL declaration as

'DOB datetime(3) NOT NULL DEFUALT '0000-00-00 00:00:00.000'

Now while extracting the rows of writerows event reading column data and it is going to method _read_datetime()

Getting a large number in the value = self.packet.uint64()

How to get the datetime stamp and what exactly should be there in struct.unpack()

I tried with struct.unpack('I3s', self.read(7)) Which is working but after converting to datetime the year is getting 2038 and month and day is also not accurate. I tried with struct.unpack('<Q', self.read(8)

It is getting very larhe

Can you please help me on this @sean-k1

dongwook-chan commented 4 months ago

@MintooJinnu Could you please provide

  1. table DDL (use anonymous column names)
  2. query that's causing error
  3. error stack strace
  4. packet dump (if possible)

I can not reproduce the issue without these information. Any of above does NOT reveal any of your data in prod db. So please feel safe to share the info.