Closed cmcavoy closed 10 years ago
I wouldn't have thought that querying for all the claim codes would show up as a separate query. The claim codes are just an array of subdocuments in badges, so badges.findById(whatever) would pull all of them along with the badge data itself. That being said, yeah, totally, I'm sure there is a ton of wasteful data being returned here. Possibly the time openbadger spends actually creating objects out of the massive amounts of data is what is causing the slowdown? Or some other such thing that doesn't shop up in an obvious way in the profiling.
There was a similar issue with the /earn search returning image data for every single badge it searched on, and cleaning that up sped the search up considerably. So it's very possible that's what is going on here, yeah.
MongoDB Monitoring: https://rpm.newrelic.com/accounts/255689/dashboard/2919614 FromIRC:
mostlygeek jp: i'm just helping debug between meetings right now ;) mostlygeek hi cmcavoy jp told me to join cause i was looking at some of the perf stuff w/ badgets mostlygeek badges mostlygeek jp/cmcavoy i wonder if this is your issue mostlygeek so in your Mongoose models, the Badge has a ref to Program, which has a ref to Issuer, which has a ref to Programs... mostlygeek i wonder if if that circular ref chain is causing an infinite loop of data pulls mostlygeek that eventually causes the universe to implode (or something less dramatic) mostlygeek here: https://github.com/mozilla/openbadger/blob/v2.0/models/badge.js#L90 mostlygeek here: https://github.com/mozilla/openbadger/blob/v2.0/models/program.js#L34 mostlygeek and here https://github.com/mozilla/openbadger/blob/v2.0/models/badge.js#L94 (where it starts) jp <3 <3 <3 cmcavoy mostlygeek, that looks promising cmcavoy we just started looking into the actual models jp cmcavoy, brianloveswords, echristensen ^^
It's totally the claim codes. The /earn page took 20x longer to load locally once I added 50,000 claim codes to a badge. After I had that search exclude claimCodes from the result, it was back to its original speed (actually slightly improved).
CSOL production has seen a gradual slowdown over the last several weeks. The call to https://badge.chicagosummeroflearning.org/v2/badges takes several seconds to load. New Relic is showing it upwards of 15 seconds to return.
There's other API calls with similar slow speeds. No code updates have happened since early August. @jdotpz and I turned on profiling in Mongo to look for slow queries, but haven't found anything that takes longer than 30 ms to complete.
After some discussion with @brianloveswords, I started looking for issues with Mongoose. Any queries that asked for an inordinate number of small queries to compose (lots and lots of 20 ms queries add up). This blog post helped. The badge-instance model links directly to the claim code model. In early August, we added tens of thousands of claim codes in a batch. I believe we have an issue somewhere in the code where a call to gather all badges, or even subsets of badges, are also including all the claim codes. For some badges, that could literally include 100k plus additional records (CPD badges as an example).
The badge-instance / claim code issue is just one possibility, there's probably some other issues along the same line. Performance dropped off after we issued many badges for CPD, so I'd start there.