nextcloud / mail

💌 Mail app for Nextcloud
https://apps.nextcloud.com/apps/mail
GNU Affero General Public License v3.0
839 stars 260 forks source link

Mailbox unable to sync if previous sync died between message insertion and setting the sync token #4341

Closed mgriego closed 3 years ago

mgriego commented 3 years ago

Expected behavior

Mailbox sync should be able to recover from prior failures.

Actual behavior

If a previous mailbox sync fails (ie out of memory, execution time runs out, etc) in between the time that the messages begin being inserted in into the database and the time that the sync token is saved to the mailbox, then future syncs may fail resulting in a mailbox appearing stuck in a "Loading Messages" state. This can happen because the sync token stores the most recently-retrieved UID to pass to the server on the next fetch. If it is not updated due to the failure, then the Mail app will try to re-insert previously-downloaded messages, resulting in an SQL primary key constraint violation. This will throw an exception that is not accounted for, resulting in a 500 response. Note that, when this happens, subsequent sync calls from the browser will result in 409 responses as the sync locks do not get cleaned up when the exception is thrown.

Here is a real life example of this happening to me:

You can see from here that the UID referenced in the sync token has been removed from the mailbox, but two more messages (254880 and 254881) have been previously synced. Since the token thinks the most recently-synced message is 254879, it uses that when retrieving messages, resulting in Mail attempting to re-insert messages 254880 and 254881. Of course, the constraint violation happens as soon as Mail attempts to insert 254880, resulting in this error:

Sync failed for 1:INBOX: An exception occurred while executing 'INSERT INTO `oc_mail_messages` (`uid`, `message_id`, `references`, `in_reply_to`, `thread_root_id`, `mailbox_id`, `subject`, `sent_at`, `flag_answered`, `flag_deleted`, `flag_draft`, `flag_flagged`, `flag_seen`, `flag_forwarded`, `flag_junk`, `flag_notjunk`, `flag_important`) VALUES(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)' with params [254880, "REDACTED", "[]", null, "REDACTED", 58, "REDACTED", 1609722005, false, false, false, false, false, false, false, false, false]:

SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '254880-58' for key 'mail_msg_mb_uid_idx'

The above was also mentioned in https://github.com/nextcloud/mail/issues/4095#issuecomment-757390230.

I've started looking into the best way to fix this. I personally believe that the best way to fix this is to use an outer transaction in ImapToDbSynchronizer::sync. This way, if any portion of the sync fails, then the entire DB transaction will be rolled back, assuming the DBMS supports transactions properly. As it stands, transactions are only used in the MessageMapper during the bulk insert/update methods.

Another possible fix is to update the mailbox after each (new messages/update messages/vanished messages), that way there's less of a possibility of a failure in later syncs (ie update/delete) causing earlier sync tokens (ie create) to not get updated.

There's also the option of removing messages that were previously synced that have a higher UID than what's in the "new" sync token, however that seems really problematic. The ImapToDbSynchronizer is not exected to understand the internals of the sync token, and deleting messages from the database between the new and changed sync operations could cause further regressions.

Mail app

Mail app version: 1.7.2

Mailserver or service: Locally-hosted Zimbra

Server configuration

Operating system: Using Nextcloud Docker image: nextcloud:20-apache

Web server: Apache via Docker image behind a Nginx reverse proxy running on the Docker host

Database: MariaDB via latest Docker image (as of about a week ago)

PHP version: PHP 7.4 (via Docker image)

Nextcloud Version: 20.0.4

Client configuration

Browser: Chrome 87.0.4280.141

Operating system: Xubuntu 20.04.1

ChristophWurst commented 3 years ago

@mgriego that is some excellent analysis. I'll try to address this soon and experiment with your suggestions. Thanks a lot, highly appreciated :pray:

miaulalala commented 3 years ago

@mgriego would you be so kind as to test the new version from master? There is an underlying query change that has (at least for me) made the bug disappear.

Thank you!

no-response[bot] commented 3 years ago

This issue has been automatically closed because there has been no response to our request for more information from the original author. With only the information that is currently in the issue, we don't have enough information to take action. Please reach out if you have or find the answers we need so that we can investigate further.

ChristophWurst commented 3 years ago

I've started looking into the best way to fix this. I personally believe that the best way to fix this is to use an outer transaction in ImapToDbSynchronizer::sync. This way, if any portion of the sync fails, then the entire DB transaction will be rolled back, assuming the DBMS supports transactions properly. As it stands, transactions are only used in the MessageMapper during the bulk insert/update methods.

I was finally able to wrap my head around this as I had a setup that allowed me to step through this locally in a totally reproducible way.

So, it turns out the logic to unlock a mailbox after said exception is run. However, it has no effect. The mailbox is left with its locks, despite the UPDATE queries run to remove them. With the help of https://stackoverflow.com/questions/16627942/php-pdo-transaction-automatic-rollback I realized that you actually do not get an automatic rollback with PDO. This was my fundamental misunderstanding of transactions in PDO. So I think whenever \OCP\IDBConnection::beginTransaction is used there should be a try-catch afterwards to roll back explicitly when an error occurs.

ChristophWurst commented 3 years ago

@mgriego would appreciate your feedback on https://github.com/nextcloud/mail/pull/5440 as you seem to have a solid understanding of this problem :v: