lemonsaurus / blackbox

Magically save your database backups and critical logs in your favorite cloud storage provider.
MIT License
49 stars 10 forks source link

Blackbox fails to send error notification if failure reason field is over Discord embed limit #155

Open jb3 opened 7 months ago

jb3 commented 7 months ago

Error report

Discord has a field size limit of 1024 characters, not 2,000 as is currently programmed. This means that blackbox silently fails to notify if a backup has failed.

https://github.com/lemonsaurus/blackbox/blob/d40f08d0d4be701faca0b17ab3407543e7fd31ea/blackbox/handlers/notifiers/discord.py#L13-L14

Additionally, this truncation hides the actual error for failing databases since it includes all output from databases that successfully backed up, a future iteration might want to include only the output for the failing stage.

As an example of an invalid body, I have attached the current PyDis backup output which is failing because of #154, as you can see the truncated body means that no PostgreSQL error is logged, even though the actual PostgreSQL error is only a couple of short lines.

Webhook Payload ```json { "content": null, "embeds": [ { "title": "Backup", "color": 13377568, "fields": [ { "name": "**main_mongodb**", "inline": true, "value": ":white_check_mark: singapore_s3\n:white_check_mark: newark_s3\n:white_check_mark: frankfurt_s3" }, { "name": "**main_redis**", "inline": true, "value": ":white_check_mark: singapore_s3\n:white_check_mark: newark_s3\n:white_check_mark: frankfurt_s3" }, { "name": "**main_postgres**", "inline": true, "value": ":x:" }, { "name": "Output", "value": "2024-05-02T16:17:58.887+0000\twriting admin.system.users to archive '/tmp/tmpot_19_0a/main_mongodb_*****_02_05_2024.archive'\n2024-05-02T16:17:58.890+0000\tdone dumping admin.system.users (1 document)\n2024-05-02T16:17:58.890+0000\twriting admin.system.version to archive '/tmp/tmpot_19_0a/main_mongodb_*****_02_05_2024.archive'\n2024-05-02T16:17:58.893+0000\tdone dumping admin.system.version (2 documents)\n2024-05-02T16:17:58.894+0000\twriting pydis_forms.forms to archive '/tmp/tmpot_19_0a/main_mongodb_*****_02_05_2024.archive'\n2024-05-02T16:17:58.894+0000\twriting modmail_bot.notes to archive '/tmp/tmpot_19_0a/main_mongodb_*****_02_05_2024.archive'\n2024-05-02T16:17:58.896+0000\twriting modmail_bot.logs to archive '/tmp/tmpot_19_0a/main_mongodb_*****_02_05_2024.archive'\n2024-05-02T16:17:58.896+0000\twriting pydis_forms.responses to archive '/tmp/tmpot_19_0a/main_mongodb_*****_02_05_2024.archive'\n2024-05-02T16:17:58.906+0000\tdone dumping modmail_bot.notes (20 documents)\n2024-05-02T16:17:58.906+0000\twriting pydis_forms.admins to archive '/tmp/tmpot_19_0a/main_mongodb_*****_02_05_2024.archive'\n2024-05-02T16:17:58.910+0000\tdone dumping pydis_forms.forms (26 documents)\n2024-05-02T16:17:58.911+0000\twriting modmail_bot.plugins.PingManager to archive '/tmp/tmpot_19_0a/main_mongodb_*****_02_05_2024.archive'\n2024-05-02T16:17:58.912+0000\tdone dumping pydis_forms.admins (2 documents)\n2024-05-02T16:17:58.913+0000\twriting modmail_bot.config to archive '/tmp/tmpot_19_0a/main_mongodb_*****_02_05_2024.archive'\n2024-05-02T16:17:58.913+0000\tdone dumping modmail_bot.plugins.PingManager (2 documents)\n2024-05-02T16:17:58.914+0000\twriting modmail_bot.plugins.BanAppeals to archive '/tmp/tmpot_19_0a/main_mongodb_*****_02_05_2024.archive'\n2024-05-02T16:17:58.922+0000\tdone dumping modmail_bot.config (1 document)\n2024-05-02T16:17:58.923+0000\twriting pydis_forms.discord_members to archive '/tmp/tmpot_19_0a/main_mongodb_*****_02_05_2024.archive'\n2024-05-02T16:17:58.927+0000\tdone dumping pydis_forms.discord_m" } ] } ], "username": "blackbox", "avatar_url": "https://raw.githubusercontent.com/lemonsaurus/blackbox/main/img/blackbox_avatar.png" } ```
Actual PostgreSQL error ``` blackbox-1235626330217779380-4wz7x blackbox 16:18:26 pg_dumpall: error: aborting because of server version mismatch blackbox-1235626330217779380-4wz7x blackbox pg_dumpall: detail: server version: 16.2; pg_dumpall version: 15.5 (Debian 15.5-1.pgdg120+1) ```

Suggested fix (by 🍋 lemonsaurus)

lemonsaurus commented 6 months ago

Yeah this part sucks. Maybe instead of just naively truncating, we could upload the full error message somewhere and attach a URL or something. It's just a bit sketchy if that log contains sensitive shit.

also, but I'm not sure this is possible, but replying to the notification (turning it into a thread) on Discord with the full log split into multiple messages would be a fairly elegant solution.

Labeling this as help wanted, in case someone wants to implement some solution for this. ❤️

jb3 commented 6 months ago

The question is more why on an error are the process logs from all subprocesses attached.

If this was only the tool that experienced the error and the logs were tailed to the last N lines then this error wouldn't matter.

If pg_dumpall fails, it should not attach the logs of mongodump.

Most tools experience an error that will fit in Discord, but not if all subprocess logs are attached.

lemonsaurus commented 6 months ago

yeah jb, that's all true too. We should only be capturing the relevant errors in the output. But then there really are three bugs here to fix: