zendesk / maxwell

Maxwell's daemon, a mysql-to-json kafka producer
https://maxwells-daemon.io/
Other
4k stars 1.01k forks source link

BinLogConnectionReplicator errors after MySQL 8 upgrade #2017

Closed joshuabaird closed 1 year ago

joshuabaird commented 1 year ago

We recently used Amazon RDS's blue/green functionality to upgrade a cluster from 5.7 to 8.0. After the upgrade, we began to see the following error in Maxwell logs:

15:17:04,097 DEBUG BinlogConnectorReplicator - error code: 1236 from server
com.github.shyiko.mysql.binlog.network.ServerException: A slave with the same server_uuid/server_id as this slave has connected to the master; the first event 'mysql-bin-changelog.002005' at 1882244, the last event read from '/rdsdbdata/log/binlog/mysql-bin-changelog.002005' at 1932003, the last byte read from '/rdsdbdata/log/binlog/mysql-bin-changelog.002005' at 1932003.

I'm not sure if this is a result of how RDS handles the blue/green functionality or not, but to try and resolve, we wiped the maxwell database table from the database to let Maxwell recreate it. We're still getting the same error.

We have a few different Maxwell processes, and so our maxwell.positions table looks like this:

+-----------+----------------------------+-----------------+----------+-------------------------------------------------+--------------+---------------------+
| server_id | binlog_file                | binlog_position | gtid_set | client_id                                       | heartbeat_at | last_heartbeat_read |
+-----------+----------------------------+-----------------+----------+-------------------------------------------------+--------------+---------------------+
| 547860478 | mysql-bin-changelog.002005 |         2100437 | NULL     | maxwell-reference_providers_bootstrap           |         NULL |       1687275124402 |
| 547860478 | mysql-bin-changelog.002005 |         2085086 | NULL     | maxwell-reference_provider_taxonomies_bootstrap |         NULL |       1687275022166 |
| 547860478 | mysql-bin-changelog.002005 |         2097041 | NULL     | maxwell-templates                               |         NULL |       1687275112782 |
| 547860478 | mysql-bin-changelog.002005 |         2111856 | NULL     | maxwell-terminology                             |         NULL |       1687275216159 |
+-----------+----------------------------+-----------------+----------+-------------------------------------------------+--------------+---------------------+

Any ideas how to resolve this error?

osheroff commented 1 year ago

I'm not intimate with the mechanics of the blue/green upgrade. how is each client configured? You may wish to configure each client with a unique replica_server_id

On Tue, Jun 20, 2023 at 9:15 AM Josh Baird @.***> wrote:

We recently used Amazon RDS's blue/green functionality to upgrade a cluster from 5.7 to 8.0. After the upgrade, we began to see the following error in Maxwell logs:

15:17:04,097 DEBUG BinlogConnectorReplicator - error code: 1236 from server com.github.shyiko.mysql.binlog.network.ServerException: A slave with the same server_uuid/server_id as this slave has connected to the master; the first event 'mysql-bin-changelog.002005' at 1882244, the last event read from '/rdsdbdata/log/binlog/mysql-bin-changelog.002005' at 1932003, the last byte read from '/rdsdbdata/log/binlog/mysql-bin-changelog.002005' at 1932003.

I'm not sure if this is a result of how RDS handles the blue/green functionality or not, but to try and resolve, we wiped the maxwell database table from the database to let Maxwell recreate it.

We have a few different Maxwell processes, and so our maxwell.positions table looks like this:

+-----------+----------------------------+-----------------+----------+-------------------------------------------------+--------------+---------------------+ | server_id | binlog_file | binlog_position | gtid_set | client_id | heartbeat_at | last_heartbeat_read | +-----------+----------------------------+-----------------+----------+-------------------------------------------------+--------------+---------------------+ | 547860478 | mysql-bin-changelog.002005 | 2100437 | NULL | maxwell-reference_providers_bootstrap | NULL | 1687275124402 | | 547860478 | mysql-bin-changelog.002005 | 2085086 | NULL | maxwell-reference_provider_taxonomies_bootstrap | NULL | 1687275022166 | | 547860478 | mysql-bin-changelog.002005 | 2097041 | NULL | maxwell-templates | NULL | 1687275112782 | | 547860478 | mysql-bin-changelog.002005 | 2111856 | NULL | maxwell-terminology | NULL | 1687275216159 | +-----------+----------------------------+-----------------+----------+-------------------------------------------------+--------------+---------------------+

Any ideas how to resolve this error?

— Reply to this email directly, view it on GitHub https://github.com/zendesk/maxwell/issues/2017, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB7P5E6QNXHF7OVZDKFGR3XMHEB7ANCNFSM6AAAAAAZNQVU7M . You are receiving this because you are subscribed to this thread.Message ID: @.***>

joshuabaird commented 1 year ago

Maxwell clients are configured like so:

    bin/maxwell \
      --host=${DB_HOST} \
      --port=${DB_PORT} \
      --user=${MAXWELL_USER} \
      --password=${MAXWELL_PASSWORD} \
      --client_id="maxwell-${MAXWELL_FILTER}" \
      --ssl=VERIFY_CA \
      --producer=sqs \
      --sqs_queue_uri=${MAXWELL_SQS_QUEUE} \
      --log_level=${MAXWELL_LOG_LEVEL} \
      --filter='exclude: *.*, include: terminology.*'

All Maxwell clients connect to the same database (DB_HOST). We haven't ever set --replica-server-id before, so I didn't think it was relevant -- but you are suggesting otherwise?

osheroff commented 1 year ago

I think it may be relevant in 8.0 and not in 5.7.

osheroff commented 1 year ago

If this turns out to fix the issue LMK and I'll update the docs

joshuabaird commented 1 year ago

This did fix the issue above. Thank you!

osheroff commented 1 year ago

updated the docs to reflect this need.