xDefcon / sinusbot-scripts

Scripts for SinusBot (sinusbot.com)
GNU General Public License v3.0
14 stars 10 forks source link

HTTP ERROR #40

Closed TheChaosToast closed 3 years ago

TheChaosToast commented 3 years ago

Good Morning all,

I have had the following error message in the logs since this morning. I assume that the problem is not mine? 2021-04-08T09:27:27+02:00 [ AntiProxy:370:24] Could not retrieve info for xxxxx:xxxx:xxxxxxxx8::1 HTTP_ERROR: undefined 2021-04-08T09:27:27+02:00 [ AntiProxy:370:24] Could not retrieve info for xxxxx:xxxx:xxxxxxxx8::1 HTTP_ERROR: undefined 2021-04-08T09:27:27+02:00 [ AntiProxy:370:24] Could not retrieve info for xxx.xx.xxx.137 HTTP_ERROR: undefined 2021-04-08T09:27:27+02:00 [ AntiProxy:370:24] Could not retrieve info for xxxxx:xxxx:xxxxxxxx:14c9 HTTP_ERROR: undefined 2021-04-08T09:27:27+02:00 [ AntiProxy:370:24] Could not retrieve info for xx.xx.xx.185 HTTP_ERROR: undefined 2021-04-08T09:27:27+02:00 [ AntiProxy:370:24] Could not retrieve info for xx.xx.xx.72 HTTP_ERROR: undefined

xDefcon commented 3 years ago

Good morning. The issue is now solved. A message will follow here with the explaination of what happened

TheChaosToast commented 3 years ago

Ah, ok. Thanks for the fast answer :)

xDefcon commented 3 years ago

When I woke up I've noticed many emails from the monitoring system that my API was going online, then offline, then online again. This lasted for around 2 hours.

When I checked what was going on I saw that api.xdefcon.com was returning 520 error, which is an error from Cloudflare saying "Webserver is returning an unknown error". There's also an SMS alert that has a custom notification sound (very loud :bell:) that should wake me up if something like this happens when i'm sleeping, unfortunately, error 520 didn't trigger that.

What I did was restarting the web servers (there are two distinct ones for my API to avoid these kinds of issues) but the error was still present. I checked what was the 520 error from Cloudflare and here they give more details. From the server monitoring system (Datadog and HetrixTools are what I use) I couldn't see any abnormal metric, the servers were running fine with plenty of resources available before all the incoming web traffic dropped due to this error. What fixed the issue was a server reboot and a toggle between the proxying services of cloudflare as it is what they suggest the link above to troubleshoot the issue with my web servers. I suspect that what actually fixed the issue was the server reboot.

What I suspect is some kind of bug, or situation that started sending empty responses. I'm downloading the logs to see if I can find more details.

TheChaosToast commented 3 years ago

Thank you for the detailed information:

I can confirm that everything works fine again :)

Good luck with troubleshooting and greetings from Germany :)

xDefcon commented 3 years ago

After analyzing the syslog of one of the two servers, the cause was a generalized segmentation fault:

kernel: [336045.036047] apache2[13032]: segfault at 1 ip 00007fccc78dfd80 sp 00007fcc7fffea68 error 6 in libcrypto.so.1.1[7fccc77fb000+19e000]
kernel: [336045.036252] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
kernel: [336045.844065] php-fpm7.3[31193]: segfault at 0 ip 00007f773c1df990 sp 00007fff0cdce0b8 error 6 in libcrypto.so.1.1[7f773c0fb000+19e000]
kernel: [336045.844075] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

The second server was fine all of this time but the system didn't redirect all the traffic to it because "I don't see any problem" 🤕 I'm fixing this issue with the monitoring system and considering switching to https://www.cloudflare.com/it-it/load-balancing/.