[bug]: Problems reading from store

yannikbloscheck commented 4 months ago

What happened?

There is an error message about a failure to read from the store. I don't know more about it. This issue appeared first after updating to version 0.8.1 from 0.8.0. The issue #544 could be related.

How can we reproduce the problem?

Restart Stalwart or often when housekeeping tasks run

Version

v0.8.1

What database are you using?

RocksDB

What blob storage are you using?

Filesystem

Where is your directory located?

Internal

What operating system are you using?

Linux

Relevant log output

2024-06-16T00:48:00.172395Z  INFO common::manager::boot: Starting Stalwart Mail Server v0.8.1...
2024-06-16T00:48:00.614560Z ERROR smtp::reporting::scheduler: Failed to read from store: Internal Error: Failed to deserialize report domain: [0, 0, 0, 0, 0, ***, ***, ***, **, *, ***, ***, ***, ***, 0, **, ***] context="queue" event="error"

(Redacted numbers with stars, because I'm not sure what they mean and if they can contain personal information)

Code of Conduct

[X] I agree to follow this project's Code of Conduct

yannikbloscheck commented 4 months ago

The issue still seems to exist with 0.8.2. I still get similar error messages. Or could there still some old messages stuck in the queue from before somehow? The DMARC reports in my interface are also still empty since the update, but then I'm not a heavy e-mail sender.

mdecimus commented 4 months ago

The issue still seems to exist with 0.8.2. I still get similar error messages.

The new version prevents this problem from happening but does not delete the entries that were placed in the wrong table by the previous version. I'll have to create a command line utility for this as I don't think it's a good idea to include in the main binary. Or what you can do in the meantime is to:

Temporarily set the report expiration to 1 hour.
Temporarily set the purge task to do the cleanup near your current local time.
Stop version 0.8.2.
Run version 0.8.1 from the command line (making use you run it as the correct user) and wait until you see the purge task execution in the logs (you might need to increase your log level).
CTRL-C version 0.8.1 and start version 0.8.2.

The DMARC reports in my interface are also still empty since the update, but then I'm not a heavy e-mail sender.

Yes, this is because now the correct table is being used which is empty at the moment.

yannikbloscheck commented 4 months ago

I have followed your instructions. I set report.analysis.store to 1 hour, set a cleanup time a few minutes in the future and ran Stalwart 0.8.1 as the correct user. I could see in the log that the housekeeping task for purging accounts was run at the specified time. Despite that when I started Stalwart 0.8.2 afterwards I got the following message: Failed to read from store: Internal Error: Failed to deserialize report domain: [0, 0, 0, 0, 0, 102, 118, 171, 206, 2, 115, 51, 4, 10, 192, 17, 222] context="queue" event="error" on startup.

I already updated to 0.8.2 on Saturday though. That's why I would've expected new DMARC reports to have come in by now over the last 48 hours and show up on the interface. But to make sure I now send an e-mail to an e-mail provider that definitely does daily DMARC reports. So by tomorrow I should know for sure, if it's working or not.

yannikbloscheck commented 4 months ago

Interestingly the same error message I just mentioned showed up exactly the moment I sent my test e-mail and when I received the response to my test e-mail

yannikbloscheck commented 4 months ago

Despite the mentioned error still appearing I can now confirm that DMARC reports now show up in the interface again

mdecimus commented 4 months ago

I have followed your instructions. I set report.analysis.store to 1 hour, set a cleanup time a few minutes in the future and ran Stalwart 0.8.1 as the correct user. I could see in the log that the housekeeping task for purging accounts was run at the specified time. Despite that when I started Stalwart 0.8.2 afterwards I got the following message: Failed to read from store: Internal Error: Failed to deserialize report domain: [0, 0, 0, 0, 0, 102, 118, 171, 206, 2, 115, 51, 4, 10, 192, 17, 222] context="queue" event="error" on startup.

It should be the store purge task, not the one for accounts. You can also trigger it by sending an authenticated GET request to /api/store/purge/data.

yannikbloscheck commented 4 months ago

I was finally able to get rid of the error. I'm not sure, if it was just the command or other things I tried, because as my luck would have it just as I was switched to 0.8.1 apparently new reports came in and that of course made things more complicated 😄 But everything is fine now. Thank you for your help (and all the work you do with the project)!

alucryd commented 2 months ago

@mdecimus Can I temporarily run 0.8.1 if I am currently on 0.9.2?

Edit: Getting the following error, maybe it's caused by something else:

2024-08-26T16:11:16Z ERROR Data corruption detected (store.data-corruption) causedBy = crates/store/src/write/key.rs:663, key = base64:redacted, causedBy = crates/store/src/dispatch/store.rs:125, causedBy = crates/smtp/src/reporting/scheduler.rs:146, details = Failed to read from store

mdecimus commented 2 months ago

@alucryd Yes.

kanashimia commented 1 week ago

For anyone who stumbles upon this, it seems you can just delete all outgoing reports like this for RocksDB:

$ ldb --db=./data drop_column_family h

This fixes the issue for me. Stalwart will recreate that column family on next startup.

Be sure to backup the data before running this command just to be safe. Also your mail server needs to be offline.

For other databases something similar can be done I imagine.

Found what those column families mean from code: https://github.com/stalwartlabs/mail-server/blob/c380ec750a1ffcaf6b937c60bca7bd3b5e041a5b/crates/store/src/lib.rs#L147-L148

stalwartlabs / mail-server