verbb / formie

The most user-friendly forms plugin for Craft CMS.
Other
93 stars 69 forks source link

Notification email not clearing from queue - so thousands were sent!! #1636

Closed stevefloat closed 6 months ago

stevefloat commented 7 months ago

Describe the bug

This appears to have been an anomaly and has only happened twice. However wanted to raise the issue so you could offer advice or investigate a little.

We have a pretty standard form set up. In the last week it has received 11 submissions. For 9 of those, the email notification on the form was processed exactly as expected.

However for two of those submissions, which happened within minutes of each other, the email notification never cleared from the queue and became stuck there.

We have a Cron task that runs jobs in the queue every 4 minutes. Therefore, these two email notification emails were repeatedly sent to the same email address over and over. Meaning thousands of emails were sent! Until I was alerted to the issue and killed the jobs in the queue.

As I said, other users have submitted this form and notification emails have sent correctly. I've test it myself and all works fine.

So I can't see any reason for this. But as I said, wanted to flag it incase you can offer any insight.

Any feedback appreciated!

Steps to reproduce

  1. Not really reproducible I'm afraid!

Form settings

Craft CMS version

Craft CMS PRO 4.5.9

Plugin version

2.0.39

Multi-site?

No

Additional context

No response

engram-design commented 7 months ago

So that's something I've not heard of happening. Were there any errors in the queue jobs? It's possible that something has thrown an error after the email notification is sent (like saving the Sent Notification) which caused the queue job to fail, and because it had, will try and restart the job, which includes sending out email notifications, triggering integrations, and saving Sent Notifications (if enabled).

Were you able to inspect the queue job at all before removing it?

stevefloat commented 7 months ago

Thanks for getting back Josh

I did see the two jobs in the queue. They weren't erroring but active and marked as 'reserved'. I did inspect the queue job to see if it was showing anything specific, but didn't seem to be. I should have taken a screenshot of it, but I didn't I'm afraid. I was more keen to delete the jobs to stop the email sending!

The sent notifications are all saved - I can see them listed for over 2,500 sent notifications!

An error being thrown somewhere sounds like it could be the cause, if that would keep the job in the queue. The Formie log for the day/time of the initial submission don't show any errors.

engram-design commented 7 months ago

I’ll definitely look into what might’ve happened here, but I’ll say you’re the first case I’ve heard of this! The queue job being in a reserved state is interesting. It’s almost like the job might’ve timed out or something, but that would typically throw an error on the job.

So if an error was encountered, It’ll switch the job to an error state, and won’t try again. Is there anything custom module wise, or other plugins that might change queue behavior?

stevefloat commented 7 months ago

Thanks Josh. There are no custom modules or any plugins that change queue behaviour. It's a relatively simple site.

engram-design commented 7 months ago

And your queue is run just via the default mechanism (when viewing the Craft admin), or is it triggered with a cron job or CLI command?

stevefloat commented 7 months ago

We have runQueueAutomatically is set to false and CRON job set to trigger queue/run every 4 mins.

engram-design commented 7 months ago

Sorry, you did mention that about the cron job (juggling too many tickets at once!). That may be a cause, but will continue looking into it to try and replicate.

stevefloat commented 7 months ago

Hi Josh - just wondered if you got anywhere with this? As it's happened again!

But this time, by the time I was alerted to it the job had already cleared from the queue - however it had still sent over 100 times.

stevefloat commented 7 months ago

For additional reference, I've now updated to Formie 2.0.43 and Craft 4.5.11.1. Hoping that helps.

engram-design commented 7 months ago

Do you have a cron job that restarts failed queue jobs? A failed job shouldn't restart on its own, it should just sit there without being retried in a failed state. A queue job in a reserved should be processed and completed (or an error thrown).

Otherwise, I can't conceivably understand how the job is running multiple times and not error is being thrown. Once a queue job has been completed, it's up to Craft to remove it from the queue.

Do you have the "Send Email Alert for Failed Notification" Formie setting set to anything? This might be useful so we can check if there is an error being thrown when sending an email, you'll get an alert about it.

Any errors in your /storage/logs/formie-*.log or queue-*.log files?

stevefloat commented 7 months ago

Hi Josh There is no cron job to restart failed jobs. Only a single cron set to trigger queue/run every 4 mins.

The "Send Email Alert for Failed Notification" was not set, so I've just done this now. We know the email notifications aren't failing to send, but perhaps an error is being thrown somewhere.

I'd previously looked in Formie and Queue logs and nothing stands out in there. However, I've just looked in the console log for the date in question /storage/logs/console-*.log and could see what might be the cause. There are SQL errors where it's trying to run:

2023-11-23 09:03:02 [console.ERROR] [yii\db\Exception] SQLSTATE[HY000]: General error: 2006 MySQL server has gone away The SQL being executed was: DELETE FROM craft_queue WHERE id='x' ...

This seems to suggest the console / SQL is failing to delete the job from the queue. This error repeats all day, corresponding with the sent notifications, until it stops around 16:00.

My question would be why is this happening? We have loads of successful email notifications being sent every day. From all forms including the one where we've seen errors. So what would be causing this particular one to fail?

I can send you the full log over email if that might help?

engram-design commented 7 months ago

That's interesting about the MySQL error. It's very possible that it's dying before the job has been completed, the service restarts itself, and tries to process the queue again - several times over.

Difficult to say what might be causing this error, and whether it's even Formie doing this. It might even be worth a ping to Craft support about this.

Any comment on your server infrastructure and resources?

stevefloat commented 7 months ago

Thanks Josh.

Having seen that log file I suspect I may need to drop a line to Craft support to get their view.

In terms of server / infrastructure, the site is on a cloud VPS with pretty standard set up. Should be enough resources to cope with writing to MySQL!

engram-design commented 7 months ago

Certainly worth seeing what the Craft team think. Only thing I can think of it that the serialized data for a queue job is so massive that it cannot be dropped from the table. The only way to verify that would be to see a queue job that is the issue, and look at the corresponding row in the database in your queue folder, and see what that contains.

But the nature of this issue makes it tricky to get in there and grab that...

angrybrad commented 7 months ago

Hey all...

2023-11-23 09:03:02 [console.ERROR] [yii\db\Exception] SQLSTATE[HY000]: General error: 2006 MySQL server has gone away The SQL being executed was

Unfortunately, that’s one of those generic MySQL error messages that could mean many things.

Typically when we see it, it’s because the MySQL max_allowed_packet config setting is set too low on the server, or the wait_timeout setting is. Try bumping those up to higher values and see if the behavior changes.

If that doesn’t fix it, there are other solutions to try in the official MySQL documentation on the subject: https://dev.mysql.com/doc/refman/8.0/en/gone-away.html

My guess is that if you fix that underlying problem, it’ll solve your original issue.

stevefloat commented 7 months ago

Thanks Brad for the additional insight.

I've followed up with the hosting provider and will see if we can increase these SQL config values.

I'll update when I know more

stevefloat commented 7 months ago

Hi Brad

The hosting provider seems to concur that the errors relate to a timeout. Those MySQL values are currently set as follows: max_allowed_packet = 128M wait_timeout=30

What would you advise increasing them to?

angrybrad commented 7 months ago

I’d start by doubling each.

max_allowed_packet=256M
wait_timeout=60
stevefloat commented 7 months ago

Thanks Brad - those MySQL values have now been updated as suggested

max_allowed_packet= 256M
wait_timeout=60

Will monitor to see if the issue occurs again

engram-design commented 6 months ago

Feel free to comment back here if this keeps happening, and I might need to see if I can get a bit more hands-on about this in some fashion.

stevefloat commented 5 months ago

Hi Josh

I'm afraid the issue has happened again. Despite updating the MySQL values on the server as per previous comments.

This time I grabbed a screenshot of the job from the queue before clearing it down: reserved-queue-job

The console log showed the same type of error as last time - pasting full error below (of which there was one every time the notification was sent) incase this helps offer any more insight:

2024-01-08 09:27:03 [console.ERROR] [yii\db\Exception] SQLSTATE[HY000]: General error: 2006 MySQL server has gone away The SQL being executed was: DELETE FROMcraft_queueWHEREid='2753' {"trace":["#0 /home/xxx/site/vendor/yiisoft/yii2/db/Command.php(1307): yii\\db\\Schema->convertException(Object(PDOException), 'DELETE FROMcr...')","#1 /home/xxx/site/vendor/yiisoft/yii2/db/Command.php(1102): yii\db\Command->internalExecute('DELETE FROM cr...')","#2 /home/xxx/site/vendor/craftcms/cms/src/helpers/Db.php(1146): yii\\db\\Command->execute()","#3 /home/xxx/site/vendor/craftcms/cms/src/queue/Queue.php(346): craft\\helpers\\Db::delete('{{%queue}}', Array, Array, Object(craft\\db\\Connection))","#4 /home/xxx/site/vendor/craftcms/cms/src/queue/Queue.php(933): craft\\queue\\Queue->craft\\queue\\{closure}()","#5 /home/xxx/site/vendor/craftcms/cms/src/queue/Queue.php(345): craft\\queue\\Queue->_lock(Object(Closure))","#6 /home/xxx/site/vendor/craftcms/cms/src/queue/Queue.php(191): craft\\queue\\Queue->release('2753')","#7 /home/xxx/site/vendor/craftcms/cms/src/queue/Queue.php(165): craft\\queue\\Queue->executeJob()","#8 [internal function]: craft\\queue\\Queue->craft\\queue\\{closure}(Object(Closure))","#9 /home/xxx/site/vendor/yiisoft/yii2-queue/src/cli/Queue.php(114): call_user_func(Object(Closure), Object(Closure))","#10 /home/xxx/site/vendor/craftcms/cms/src/queue/Queue.php(163): yii\\queue\\cli\\Queue->runWorker(Object(Closure))","#11 /home/xxx/site/vendor/craftcms/cms/src/queue/Command.php(68): craft\\queue\\Queue->run()","#12 [internal function]: craft\\queue\\Command->actionRun()","#13 /home/xxx/site/vendor/yiisoft/yii2/base/InlineAction.php(57): call_user_func_array(Array, Array)","#14 /home/xxx/site/vendor/yiisoft/yii2/base/Controller.php(178): yii\\base\\InlineAction->runWithParams(Array)","#15 /home/xxx/site/vendor/yiisoft/yii2/console/Controller.php(180): yii\\base\\Controller->runAction('run', Array)","#16 /home/xxx/site/vendor/craftcms/cms/src/console/ControllerTrait.php(90): yii\\console\\Controller->runAction('run', Array)","#17 /home/xxx/site/vendor/yiisoft/yii2/base/Module.php(552): craft\\queue\\Command->runAction('run', Array)","#18 /home/xxx/site/vendor/yiisoft/yii2/console/Application.php(180): yii\\base\\Module->runAction('queue/run', Array)","#19 /home/xxx/site/vendor/craftcms/cms/src/console/Application.php(91): yii\\console\\Application->runAction('queue/run', Array)","#20 /home/xxx/site/vendor/yiisoft/yii2/console/Application.php(147): craft\\console\\Application->runAction('queue/run', Array)","#21 /home/xxx/site/vendor/craftcms/cms/src/console/Application.php(122): yii\\console\\Application->handleRequest(Object(craft\\console\\Request))","#22 /home/xxx/site/vendor/yiisoft/yii2/base/Application.php(384): craft\\console\\Application->handleRequest(Object(craft\\console\\Request))","#23 /home/xxx/site/craft(13): yii\\base\\Application->run()","#24 {main}"],"memory":5279544,"exception":"[object] (yii\\db\\Exception(code: HY000): SQLSTATE[HY000]: General error: 2006 MySQL server has gone away\nThe SQL being executed was: DELETE FROMcraft_queueWHEREid='2753' at /home/xxx/site/vendor/yiisoft/yii2/db/Schema.php:676)\n[previous exception] [object] (PDOException(code: HY000): SQLSTATE[HY000]: General error: 2006 MySQL server has gone away at /home/xxx/site/vendor/yiisoft/yii2/db/Command.php:1302)"}

It would be great to be able to confidently get this resolved - however that may be!

All help appreciated!

angrybrad commented 5 months ago

@stevefloat you're sure your updated values were applied?

i.e. if you run:

show variables like 'max_allowed_packet';
show variables like 'wait_timeout';

The results you get back are your updated ones?

stevefloat commented 5 months ago

Hi Brad

I've just checked in MySQL and can see those are the values being returned: wait_timeout = 60 max_allowed_packet = 268435456 (256mb)

Would you suggest increasing these values?

angrybrad commented 5 months ago

Hrm... you can try bumping those again and see if the behavior changes.

Note that those are the two most common culprits we’ve seen, but certainly not a definitive list.

While less common, we’ve seen all sorts of variations of the common causes listed here: https://dev.mysql.com/doc/refman/8.0/en/gone-away.html

Definitely worth giving that a once over and testing.