kahmali / meteor-restivus

REST APIs for the Best of Us! - A Meteor 0.9+ package for building REST APIs https://atmospherejs.com/nimble/restivus
MIT License
544 stars 116 forks source link

Restivus defaultAuth: true and Meteor.users collection causing MongoDB CPU spike #264

Open alur222 opened 7 years ago

alur222 commented 7 years ago

Hi guys, I don't know if anyone of you here experienced this issue when using Restivus defaultAuth (/login and /logout). It has cost me 3 days to finally find the bottleneck when i was desperately trying to scale our app. This is not really an issue but I will just post it here to pay it forward. :)

Our app was launched a month ago and we have made our rest api with Restivus using the default auth. At first, when we still have less number of users than we have today, our API responds so fast and everything was running fine. Suddenly, when the app users grow, I noticed there's a big spike in Mongo (actually even the app itself uses a lot of cpu). I desperately tried to scale our app. Optimized code executions, installed Mongo Replica Set in 3 servers, leveraged OPLOG tailing, added Mongo indexes, and etc. The optimizations did well when I tested it but after a day I saw the spike again in production. :( The next day, I decided to spawn more app processes to leverage the remaining cores of our server (we have 4 cores). I was hoping that it will solve the issue. I used pm2 to run our app in a cluster. But then again, mongo cpu usage spike again and surprise! it's now reaching ~600% based on top command. I didn't lose hope though. Today, I decided to read logs. Yes the big mongodb log that can be found in /var/log dir. From there I saw the slowest query I have experienced in Meteor - queries for Meteor.users collection! So why was it slow? Maybe because I didn't add any index to users collection? or maybe something else? To debug, I logged in to our primary mongodb server, use dbname and db.users.getIndexes(). I saw the indexes are already there, therefore, indexes are not the issue here. The next idea in my mind is to finally check every user collection. I fetched each user 1 by 1 and I found 1 user who has a big list of services.resume! I don't know why the items there doesn't get removed by Meteor but it's really weird why this only user has that big list. To debug it further, I decided to watch this user's api usage and it surprised me that the user logs in to the API before any single request. That's actually bad because the user makes 500 requests a day and that also means 500 new hashed token a day in the user document! After knowing this, I realized I have to make changes and implement a solution.

My solution was to create my own authentication that searches for a valid token in the db first and return it. If a user logs in for the first time or doesn't have a valid token, we create a new token and return it. After deploying these changes to the code, our app and mongodb now behave normally in terms of CPU.

Do u have any suggestions or comments for this solution? Please comment. Thank you.

Note: The optimizations above are important too.