Open Ronnnn opened 4 months ago
I think this has been a problem since forever and I am not sure there is an easy solution. This is probably easier to solve now the queue is in the database as opposed to RabbitMQ.
Many people find they need to run multiple workers to consume their queue in a timely manner so process identification is tricky as the workers are not able to communicate with each other and I do not think Docker would be able to tell each worker about every other worker.
One solution could be a configurable lock retry setting so if locked_at was more than say an hour ago, the message is picked up again.
Thank you again for the quick responses. Really appreciate it. Might it be an idea to give the database layer some kind of retry-mechanism?
Disclaimer: untested code not written by a human ;p
def query(query)
retries = 0
begin
with_mysql do |mysql|
query_on_connection(mysql, query)
end
rescue Mysql2::Error::ConnectionError => e
if retries < 1 # Limit the number of retries to 1
retries += 1
logger.warn "Connection error on query, retrying... (Attempt #{retries + 1})"
retry
else
logger.error "Repeated connection errors on query, giving up. Error: #{e.message}"
raise
end
end
end
Seems to be a duplicate of #3023
+1 - this issue also happens to me
postal logs:
postal-web-1 | 2024-09-26 20:52:15 +0000 ERROR POST /api/v1/send/message (500) event=request transaction=[snip] controller=LegacyAPI::SendController action=message format=html method=POST path=/api/v1/send/message request_id=[snip] ip_address=[snip] status=500 db_runtime=6.752912521362305 exception_class=Mysql2::Error::ConnectionError exception_message=MySQL server has gone away exception_backtrace=/usr/local/bundle/gems/mysql2-0.5.6/lib/mysql2/client.rb:151:in `_query'\n/usr/local/bundle/gems/mysql2-0.5.6/lib/mysql2/client.rb:151:in `block in query'\n/usr/local/bundle/gems/mysql2-0.5.6/lib/mysql2/client.rb:150:in `handle_interrupt'\n/usr/local/bundle/gems/mysql2-0.5.6/lib/mysql2/client.rb:150:in `query'
mariadb logs:
[Warning] Aborted connection 34409 to db: 'unconnected' user: 'x' host: 'localhost' (Got timeout reading communication packets)
Describe the bug
In my 3.3.4 version of Postal I have this problem that the database sometimes loses connection.
Jul 10 08:55:44 post mariadbd[626]: 2024-07-10 6:55:44+0000 482 [Warning] Aborted connection 482 to db: 'unconnected' user: 'postal' host: 'localhost' (Got timeout reading communication packets)
That needs to be fixed on my server... BUT Postal is not able to handle this. The message which the worker is trying to send hangs in pending state after this:
Now the message hangs in pending state because it is still locked by the failing worker process:
The only way to get it through now is setting locked_at and locked_by fields to null.
Expected behaviour
The message should be cleaned properly by the worker so it can be picked up by another process the next time the database is back.
Environment details