Open agillz opened 4 years ago
could they be myISAM tables in one database?
The other possibility is that you are, for some or other reason, receiving a COMMIT
query event instead of or in addition to the XID event. Can you do:
show binlog events in 'binlog-file' from xxx limit 100;
On the server without the XID, where xxx
is the binlog position of the event.
Tables are InnoDB version 10 in all databases, it's the same server for events with xid and without xid. Here is what the event looks like in binlog and also attached output from maxwell.
the only difference I notice between the two commits is that the XID overflows a 32-bit integer in the case where it's missing. I can't repro anything wrong there in the most synthetic of tests. before I go chasing further down this rabbit hole, can you verify that the raw JSON in your kafka/whatever stream is missing the xid? your error messages appear to be coming from a stream processor and I want to rule that out of the equation.
Hi Ben, confirmed that raw message on the topic also does not have xid. This output is using kafka-console-consumer
I'm utterly baffled. All of this must be happening here:
your row has commit: true
, which means that line 462 is being fired. But xid
must be null somehow, and the only way that's possible is if the call to setXid
on line 468 isn't happening. event.xidData().getXid()
returns a long
, not a Long
, which means that the underlying library can't be at fault; we'd see xid of 0
not null if that was the case.
So this means that eventType
is not EventType.XID
, but that means that we should be seeing a "COMMIT" event that's actually a EventType.QUERY`. Which we don't.
So. Back to the drawing board.
The only theory I've got, and it's admittedly really weak, is that somehow Aurora is pumping SQL-query-style commit into the replication stream that you don't see with SHOW BINLOG EVENTS
.
Other weak theories:
Do you have other cases of the bug? Any commonalities?
Thanks for pointing out the code where xid is originating from. I will add an assertion there to log the raw event that has the issue. Hoping it will provide some insight. Would it helpful? Let me know if you want some more debugging to be added in getTransactionRows
or elsewhere.
Post this test ,if it seems be related to Aurora,I will get in touch with their technical support.
This is our first use case that relies on commit info thus we can not tell for sure how widespread the issue could be. We are reading from three different servers (5.6.mysql_aurora.1.22.2) using maxwell v1.24.1
I will add an assertion there to log the raw event that has the issue. Hoping it will provide some insight. Would it helpful?
Yeah that'd be quite helpful. If we can catch this in action we'll be able to see why it happened... there's definitely not been any changes since 1.24.1 that would have fixed this, so let's stay there.
Hello Team Maxwell,
We are seeing a strange issue and currently unable to determine the root cause. We have maxwell running against a mysql aurora instance which has multiple databases. In the config, we have set output_commit_info=true
We are seeing some events are missing xid while others have it. Same table with same structure exists in multiple databases, sorry I had to scrub the database name as it is contains our client info.