processone / ejabberd

Robust, Ubiquitous and Massively Scalable Messaging Platform (XMPP, MQTT, SIP Server)
https://www.process-one.net/en/ejabberd/
Other
6k stars 1.5k forks source link

Storage failures when prepared statements are enabled for MySQL #4226

Closed weiss closed 1 month ago

weiss commented 1 month ago

Environment

SQL configuration

sql_type: mysql
sql_server: localhost
sql_port: 3306
sql_username: ****************
sql_password: ****************
sql_pool_size: 4
sql_keepalive_interval: 3600
# […]

Errors from error.log

[error] <0.1293.0>@ejabberd_sql:check_error/2:1393 SQL query 'Q122372114' at {mod_mam_sql,{325,11}} failed: <<>>

Bug description

I'm running into failures storing MAM messages when prepared statements are enabled with MySQL/MariaDB (which is the default). Unfortunately I don't know how to reproduce. My impression is that it's unrelated to the MAM payload: I checked some of the failed cases, the payloads don't look unusual in any way, and I can't see an obvious pattern. However, it happens once every few minutes on our production server, so I could easily test (debugging) patches.

I'm assuming it's not actually specific to MAM (it's probably just MAM being a busy table), as I've seen one or two cases of PubSub storage failures as well, but I'm not entirely sure here.

Setting sql_prepared_statements: false makes the errors go away.

licaon-kter commented 1 month ago

Been seen these for years, on PG: https://github.com/processone/ejabberd/issues/3698 what's different now?

weiss commented 1 month ago

Been seen these for years, on PG: #3698 what's different now?

Looks different to me, yours are query timeouts, and I guess you don't get rid of them by disabling prepared statements?

In my case, there's no actual error message (<<>>).

licaon-kter commented 1 month ago

you don't get rid of them by disabling prepared statements?

this was not suggested no

In my case, there's no actual error message (<<>>).

indeed, looks odd

prefiks commented 1 month ago

@weiss Could you see if updating p1_mysql helps here? There was couple fixes in prepared statements after last release

licaon-kter commented 1 month ago

@prefiks .131 means they are missing just ~10 commits, aren't those fixes included?

prefiks commented 1 month ago

No, we use tagget/hex released version of p1_mysql in ejabberd git, but there were couple changes after that in p1_mysql repo.

weiss commented 1 month ago

Could you see if updating p1_mysql helps here? There was couple fixes in prepared statements after last release

Looks good so far, no errors after about 30 minutes, I'll reopen if I do run into problems again. Thank you!