islamic-network / api.aladhan.com

The AlAdhan API
GNU General Public License v3.0
118 stars 30 forks source link

Service is unstable #42

Closed husseinmohkhalil closed 3 years ago

husseinmohkhalil commented 3 years ago

First of all, thank you so much for such a great service, I am using it 5 times a day in my smart home, Barak allahu feek :) For the last week or so, the service is totally unstable, mainly I get HttpResponseException: forbidden, and other times I get unreachable and other types of error.

meezaan commented 3 years ago

Al Salaamu Alaykum brother.

Towards the middle of the week we started have some problems with DNS resolution via Cloudflare. I'm not entirely sure about why, but it's probably because we likely put around 50 million + DNS queries through their system. Then DNS responses then became unstable, which resulted in dropped packets - and depending on your API client you would either get an unreachable or a 502.

At the same time, we also put about 5 TB through the Cloudflare CDN in one day - and even for that people started reporting HTTP 525 errors (and this was mostly just a front for object storage, so there were no servers that could have problems).

It took about 2 days to debug because the clusters and load balancers did not report any problems.

We then switched the DNS provider and built our own CDN, but the DNS has taken 48 hours+ to propagate as DNSSEC was enabled on some of the domains in question.

This has mostly propagated now (https://www.digwebinterface.com/?hostnames=aladhan.com&type=NS&useresolver=8.8.4.4&ns=all&nameservers=) - so you should not have problems anymore, and depending on your device and ISP, you will either have to flush the DNS cache or wait for the ISP to do it.

My apologies for the inconvenience.

mohamed-arradi commented 3 years ago

Assalam Aleykoum and Ramadan Kareem,

First, thank you so much for your great service ! I use it in my smart home !. By the way do you have a page that link all apps tied to your service ?

Secondly, I still face as @husseinmohkhalil mentioned hang up service call times to times (10% of my requests) Is it still related to DNS issue previously reported ? It is now more that 48 hours.

meezaan commented 3 years ago

Alaykum salaam brother and Ramadan Kareem!

I used to keep a page - https://aladhan.com/consumers-api - but it has not been updated for a few years now. But feel free to add your app and raise a pull request.

Regarding the API issue - DNS has been stable since Monday night (and now there is a monitor that checks it every minute) but we had to add multiple load balancers due to the load and that resulted in us facing this issue (except the cloud is Linode instead of AWS but the load balancer is a managed HA Proxy instance in both cases): https://aws.amazon.com/premiumsupport/knowledge-center/target-connection-fails-load-balancer/ - which is why you were getting the intermittent connection. I put in a fix for this at 1 am Gulf Standard Time last night (or this morning technically) and have not seen a failure since that time.

But I am still monitoring - it may be helpful for you to join the Discord server because we have been tracking this issue there.

I have also opened a Support Ticket with Linode. Because the load balancers at all Cloud providers are a fully managed service when used with their Kubernetes offerings, debugging sometimes takes much longer because I have no access to the logs. In this case, for instance, the request never made it to the cluster, and the clue was the response which was returned with the refused connection: ECONNREFUSED. Uptime Robot did not report that, but StatusCake did.

mohamed-arradi commented 3 years ago

Salam Aleykoum,

Thank you for your quick reply. I will join the Discord and make a pull request on the consumers-api :). Thank you again !!!

I found one hang up time to time but indeed it is less recurring. here is one on my logs from AWS :

2021-04-21T10:30:13.837Z ERROR RequestError: Error: socket hang up at new RequestError (/var/task/node_modules/request-promise-core/lib/errors.js:14:15) at Request.plumbing.callback (/var/task/node_modules/request-promise-core/lib/plumbing.js:87:29) at Request.RP$callback [as _callback] (/var/task/node_modules/request-promise-core/lib/plumbing.js:46:31) at self.callback (/var/task/node_modules/request/request.js:185:22) at Request.emit (events.js:315:20) at Request.onRequestError (/var/task/node_modules/request/request.js:877:8) at ClientRequest.emit (events.js:315:20) at Socket.socketOnEnd (_http_client.js:493:9) at Socket.emit (events.js:327:22) at endReadableNT (internal/streams/readable.js:1327:12) at processTicksAndRejections (internal/process/task_queues.js:80:21) { cause: Error: socket hang up at connResetException (internal/errors.js:607:14) at Socket.socketOnEnd (_http_client.js:493:23) at Socket.emit (events.js:327:22) at endReadableNT (internal/streams/readable.js:1327:12) at processTicksAndRejections (internal/process/task_queues.js:80:21) { code: 'ECONNRESET' }, error: Error: socket hang up at connResetException (internal/errors.js:607:14) at Socket.socketOnEnd (_http_client.js:493:23) at Socket.emit (events.js:327:22) at endReadableNT (internal/streams/readable.js:1327:12) at processTicksAndRejections (internal/process/task_queues.js:80:21) { code: 'ECONNRESET' }, options: { uri: 'http://api.aladhan.com/v1/calendar?latitude=43.5611027&longitude=1.4531355&method=2&month=4&year=2021', headers: { Accept: 'application/json' }, callback: [Function: RP$callback], transform: undefined, simple: true, resolveWithFullResponse: false, transform2xxOnly: false }, response: undefined }

meezaan commented 3 years ago

Thank you @mohamed-arradi.

I am already liaising with Linode support and I have just made a few more changes to the load balancer settings. It's happening less, but still happens.

I am hoping they will come back with an answer today, God willing.

mohamed-arradi commented 3 years ago

Salam @meezaan ,

Just to let you know that today I had the worst day regarding hung up socket. Did Linode replied to you regarding this particular issue and managed to find anything helpful ?

Inshallah they will find out the root cause.

meezaan commented 3 years ago

Alaykum salaam brother.

Yes, it has been a difficult few days. I have slept 4 hours over the last 3 days trying to work things out.

In the end, I have finally traced it to nodes a Kubernetes cluster not being able to communicate all the time. They drop packets, which means the ingress controller and proxy cannot get to the actual pods.

I have spun up a new cluster and migrated the Qur'an API and app to that 3 hours ago and it has been, praise be to God, been up without any issues.

Insha'Allah I will migrate the Adhan API after dinner, so after a couple of hours, insha'Allah insha'Allah thing should be back up and running.

I will also so a full report to share with the community and what we will do Insha'Allah to try and ensure we are prepared for network busting traffic in the future, with God's help!

Linode is still investigating the old cluster now that we have ironed out every other possibility, and believe me, I tried everything!

meezaan commented 3 years ago

@mohamed-arradi Al Salaamu Alaykum.

I have updated the DNS entry to point to the new IP address.

Please let me know if you see an improvement, insha Allah!

husseinmohkhalil commented 3 years ago

Salamu alikom brother, Regardless of the issue, thank you so much for your dedication to the cause

Hussein Khalil

On Mon, Apr 26, 2021, 10:01 PM meezaan @.***> wrote:

@mohamed-arradi https://github.com/mohamed-arradi Al Salaamu Alaykum.

I have updated the DNS entry to point to the new IP address.

Please let me know if you see an improvement, insha Allah!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/islamic-network/api.aladhan.com/issues/42#issuecomment-827106828, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACUHLHPDDKEVJDHL3MYP2ZTTKXBDDANCNFSM43EFZNEA .

husseinmohkhalil commented 3 years ago

Alsalamu Alikom Brothers, my problem may be a little different, I was using the API from a smart-app in Samsung smartthings hub, the response was always Forbidden for some reason, and was using the API to play Azan on my google home devices. As it is Ramadan, there was no time for me to wait for the solution and I needed the Adhan to be played again, so I created an internal API in my Raspberry pie, which is a dump but effective solution, It just gets the next adhan time and sends it to the smart-app since then I have no problem at all, and the response was correct 100% of the time.

meezaan commented 3 years ago

I will close the issue now as things are back to being stable. I will post a full post mortem on the Discord server soon.