minotar / imgd

Minotar is a global avatar service that pulls your head off of your Minecraft.net skin, and allows it for use on several thousand sites.
https://minotar.net/
The Unlicense
203 stars 62 forks source link

Outage? Connection timed out #229

Closed zanethefox closed 7 months ago

zanethefox commented 7 months ago

Hi there! For the last days I have noticed that Minotar images/avatars do not load anymore and are also not showing the usual Steve fallback image either. The avatar URL returns error code 522: Connection timed out.

Would it be possible to please investigate this outage?

Thanks a lot! :)

chelminski commented 7 months ago

escalation

LukeHandle commented 7 months ago

I've just logged into things to check what's up, and it decided to load for me? I did see a 522 a short whole ago though. image

Is this intermittent, or just fixed itself?

edit: Still problematic I think. Looks like some DigitalOcean upgrade occurred and things fell apart

zanethefox commented 7 months ago

I am not entirely sure, unfortuately it's still loading most skins with error code 522. The homepage is showing up like this for me:

image

And the error message on images:

image
chelminski commented 7 months ago

Yes, because it (homepage) is in the cloudflare cache, but the server is not responding. (Probably - I think so.)

LukeHandle commented 7 months ago

Okay, should be fixed. Fire is out.

edit: Still problematic I think. Looks like some DigitalOcean upgrade occurred and things fell apart

Normally these automatic upgrades occur without issue (and, I've not had to touch things with K8 for quite a while :D). Part of the upgrade process involves it adding new servers to the pool, and removing old servers.

We use "floating IPs" that are added/removed from the nodes based on their availability. These should have been added to the new nodes as they spun up, but it appears something has broken that process.

I added the IPs manually to the nodes, and the issue resolved... but it'll come back when they next have a maintenance occur. Will need some further investigation.

zanethefox commented 7 months ago

Thanks so much for your quick response :D I really do appreciate it lots.

Hopefully the automatic process can be fixed easily, but I am glad that things are loading quickly for now again!

LukeHandle commented 7 months ago

Issue tracked back to an update to the image that does this bit:

We use "floating IPs" that are added/removed from the nodes based on their availability. These should have been added to the new nodes as they spun up, but it appears something has broken that process.

That update was due to an upstream API change that removed an old way of doing things and replaced it with a new way. The new way required a different K8 resource and we hadn't granted the service access to manage the new resource / type.

Should be all fixed now!