rjbain / dgreat

MIT License
3 stars 0 forks source link

Top Apps Performance #392

Open nicholas-kebbas opened 5 years ago

nicholas-kebbas commented 5 years ago

We encountered performance issues on our live site when we pushed the most recent Top Apps changes (https://github.com/rjbain/dgreat/commit/6b88e2f4541f2406681ce82c9ece02f8fed8e074 and https://github.com/rjbain/dgreat/commit/aaa30ccb44a382692f812f22a6f269c77f2e9065)

We pushed out the changes on the morning of Oct. 30 and disabled it the evening of Oct. 31. Screen Shot 2018-11-07 at 12.57.59 PM.png

Additional Info:

labboy0276 commented 5 years ago

@nicholas-kebbas can you give me some cas users to test on login?

nicholas-kebbas commented 5 years ago

@labboy0276 Sure:

UN: d3roles UN: d2roles UN: dfaculty UN: dstudent UN: demployee

All passwords are StashBoard1

d3roles will give you the worst case load time

labboy0276 commented 5 years ago

OK @nicholas-kebbas I redid a lot of how the code works for flagging users on login + assigning weights. There is no need for the batch anymore either.

I tested this on a pr multidev and my logins were around 3-4 seconds with the d3roles at the most. I imagine on live it will be much faster.

QA: http://pr-398-dgreat.pantheonsite.io/ PR: https://github.com/rjbain/dgreat/pull/398

nicholas-kebbas commented 5 years ago

Hey @labboy0276 The functionality seems to still be working but I'm getting 10+ second login with d3roles on http://pr-398-dgreat.pantheonsite.io/. Odd that you're getting such a faster login.

I took a quick look at new relic and there are some spikes caused by the dgreat_group module whenever I login. The chart does look a little different than it did previously:

screen shot 2018-12-03 at 12 04 32 pm

The dgreat_views spike only looks to happen when I access the /favorite-links page to reorder.

labboy0276 commented 5 years ago

I will double check @nicholas-kebbas but I was able to login around 3-4 seconds each time.

labboy0276 commented 5 years ago

I know why, the tables are so huge for flagging and user weights. I was running on a truncated table base @nicholas-kebbas

Need to stare more tomorrow to see what else I can improve.

nicholas-kebbas commented 5 years ago

@labboy0276 Ah that makes sense. I'll try and think of some ways as well

labboy0276 commented 5 years ago

@nicholas-kebbas OK, I noticed the custom user weights table had no indexes on it. I went ahead and created 2 indexes and this usually helps with performance. I tested it with d3roles and I logged in quick.

Can you double check?

nicholas-kebbas commented 5 years ago

@labboy0276 That seems to have helped. I logged in in about 5 seconds with d3roles. The spikes in new relic are also better:

screen shot 2018-12-03 at 3 00 17 pm

nicholas-kebbas commented 5 years ago

@labboy0276 Did you end up running any of the blaze meter tests and notice any improvements there?

labboy0276 commented 5 years ago

@nicholas-kebbas I was going to try and see if I could get the JMeter of Blazemeter stuffs to work today

labboy0276 commented 5 years ago

OK

So I added another index and did some testing based off of that. It is slightly more performant based on my tests as you can see here: https://docs.google.com/spreadsheets/d/1ALBBw4kCsLt6q6_UFibPJEXLAQL7MrPkMk-NeiXHZfo/edit#gid=0

This is testing the flagging of the default links per login. That is where the huge hangups were. Each cycle is a different group adding its flags. This was all done with the d3roles login.

Also, you can see from the new relic graph the whole flow of 3 indexes seems more stable (they are the first 3 peaks):

screenshot from 2018-12-04 08-53-36

So I then went ahead and setup a Blazemeter test and had it login with 50 users concurrently logging in via POST requests. I am not 100% sure if it was hitting all the same functionality, but I also logged in dozens of times over and over while the test was running as well. Every time I was in within a couple seconds.

You can see on this graph from 6:15 on what was going on. The peaks are where I was also logging in at the same time.

screenshot from 2018-12-04 09-34-04

This all seems rock solid from my end @nicholas-kebbas . Going forward, the initial login per user after the script changes will be a little slow (as in 4 seconds) then it will be almost cut in half. I am sure on live this will be faster as well.

Also, when pushing this to live make sure you update the DB as well.

drmyers1 commented 5 years ago

@labboy0276 thank you for information. We are reviewing the data and get back to you with any questions. @nicholas-kebbas @reynoldsalec

nicholas-kebbas commented 5 years ago

@labboy0276 Thanks John. Could you try running the blazemeter test with 1000 users concurrently? We want to get an accurate representation of how the site might respond with a high volume of simultaneous users. If that checks out we'll work on putting this into production this afternoon to see how it performs.