mariadb-operator / mariadb-operator

🦭 Run and operate MariaDB in a cloud native way
MIT License
508 stars 101 forks source link

[Bug] maxscale flapping #686

Closed pasztorl closed 3 days ago

pasztorl commented 5 months ago

I"m using latest operator and deployed a maxscale with mariadb async replication.

Messages in maxscale:

2024-06-14 02:34:50   notice : Server changed state: example-hu-db-0[example-hu-db-0.example-hu-db-internal.example-hu.svc.k8s.example.hu:3306]: master_up. [Down] -> [Master, Running]
2024-06-14 04:19:20   error  : Monitor was unable to connect to server example-hu-db-1[example-hu-db-1.example-hu-db-internal.example-hu.svc.k8s.example.hu:3306] : 'Host '10.15.0.159' is not allowed to connect to this MariaDB server'
2024-06-14 04:19:20   notice : Server changed state: example-hu-db-1[example-hu-db-1.example-hu-db-internal.example-hu.svc.k8s.example.hu:3306]: slave_down. [Slave, Running] -> [Down]
2024-06-14 04:19:22   notice : Server changed state: example-hu-db-1[example-hu-db-1.example-hu-db-internal.example-hu.svc.k8s.example.hu:3306]: slave_up. [Down] -> [Slave, Running]
2024-06-14 09:18:04   error  : Monitor was unable to connect to server example-hu-db-1[example-hu-db-1.example-hu-db-internal.example-hu.svc.k8s.example.hu:3306] : 'Host '10.15.0.159' is not allowed to connect to this MariaDB server'
2024-06-14 09:18:04   notice : Server changed state: example-hu-db-1[example-hu-db-1.example-hu-db-internal.example-hu.svc.k8s.example.hu:3306]: slave_down. [Slave, Running] -> [Down]
2024-06-14 09:18:04   notice : Server changed state: example-hu-db-1[example-hu-db-1.example-hu-db-internal.example-hu.svc.k8s.example.hu:3306]: slave_up. [Down] -> [Slave, Running]
2024-06-14 09:51:36   error  : Monitor was unable to connect to server example-hu-db-1[example-hu-db-1.example-hu-db-internal.example-hu.svc.k8s.example.hu:3306] : 'Host '10.15.0.159' is not allowed to connect to this MariaDB server'
2024-06-14 09:51:36   notice : Server changed state: example-hu-db-1[example-hu-db-1.example-hu-db-internal.example-hu.svc.k8s.example.hu:3306]: slave_down. [Slave, Running] -> [Down]
2024-06-14 09:51:38   notice : Server changed state: example-hu-db-1[example-hu-db-1.example-hu-db-internal.example-hu.svc.k8s.example.hu:3306]: slave_up. [Down] -> [Slave, Running]
2024-06-14 10:27:03   error  : Monitor was unable to connect to server example-hu-db-1[example-hu-db-1.example-hu-db-internal.example-hu.svc.k8s.example.hu:3306] : 'Host '10.15.0.159' is not allowed to connect to this MariaDB server'
2024-06-14 10:27:03   notice : Server changed state: example-hu-db-1[example-hu-db-1.example-hu-db-internal.example-hu.svc.k8s.example.hu:3306]: slave_down. [Slave, Running] -> [Down]
2024-06-14 10:27:03   notice : Server changed state: example-hu-db-1[example-hu-db-1.example-hu-db-internal.example-hu.svc.k8s.example.hu:3306]: slave_up. [Down] -> [Slave, Running]
2024-06-14 10:34:03   error  : Monitor was unable to connect to server example-hu-db-1[example-hu-db-1.example-hu-db-internal.example-hu.svc.k8s.example.hu:3306] : 'Host '10.15.0.159' is not allowed to connect to this MariaDB server'
2024-06-14 10:34:03   notice : Server changed state: example-hu-db-1[example-hu-db-1.example-hu-db-internal.example-hu.svc.k8s.example.hu:3306]: slave_down. [Slave, Running] -> [Down]
2024-06-14 10:34:03   notice : Server changed state: example-hu-db-1[example-hu-db-1.example-hu-db-internal.example-hu.svc.k8s.example.hu:3306]: slave_up. [Down] -> [Slave, Running]

Messages in mariadb at the same time:

2024-06-14  0:27:13 92226 [Warning] Aborted connection 92226 to db: 'unconnected' user: 'unauthenticated' host: '10.15.0.159' (This connection closed normally without authentication)
2024-06-14  4:19:20 118426 [Warning] Aborted connection 118426 to db: 'unconnected' user: 'unauthenticated' host: '10.15.0.159' (This connection closed normally without authentication)
2024-06-14  9:18:04 153101 [Warning] Aborted connection 153101 to db: 'unconnected' user: 'unauthenticated' host: '10.15.0.159' (This connection closed normally without authentication)
2024-06-14  9:51:36 156915 [Warning] Aborted connection 156915 to db: 'unconnected' user: 'unauthenticated' host: '10.15.0.159' (This connection closed normally without authentication)
2024-06-14 10:27:03 161104 [Warning] Aborted connection 161104 to db: 'unconnected' user: 'unauthenticated' host: '10.15.0.159' (This connection closed normally without authentication)
2024-06-14 10:34:03 161980 [Warning] Aborted connection 161980 to db: 'unconnected' user: 'unauthenticated' host: '10.15.0.159' (This connection closed normally without authentication)

The networking is ok, I've tested it multiple tools, so no packet loss. Maxscale reports that slave is Down, slave reports it got connection closed without authentication.

How can I debug this problem?

Thanks!

mmontes11 commented 5 months ago

Hey there ! Thanks for reporting.

I would check whether the monitoring@%user is available on the MariaDB server side, and also attempt to connect manually providing its password. There is something wrong with it, maybe a drift in the credentials? MaxScale is attempting to connect using the password plugin and the credentials available in the Secret.

pasztorl commented 5 months ago

It is not a credential problem, because the connection success most of the time, but "sometime" randomly not mariadb drops the connection? If I set the backend_connect_attempts to 3 in the maxscale monitor these messages disappears. Interesting that it happends also when the maxscale and the monitored mariadb running on the same host, so it is not a network problem. When I stresstest logins with mariadb client binary sometimes I got connection problem, so maybe that will be a mariadb issue(?) Others not running to this issue?

github-actions[bot] commented 3 months ago

This issue is stale because it has been open 60 days with no activity.

github-actions[bot] commented 1 month ago

This issue is stale because it has been open 60 days with no activity.

github-actions[bot] commented 3 days ago

This issue was closed because it has been stalled for 30 days with no activity.