voat / voat

The code that powers Voat
https://voat.co
Other
1.19k stars 272 forks source link

Perf improvements strategy #464

Closed johnduhart closed 8 years ago

johnduhart commented 9 years ago

Hey guys, I'm got some stuff going locally to improve performance, but I'd like to work together to make sure I'm fixing problems that are currently affecting production the worst.

Right now I'm setting out to tackle the following:

The last item may require introducing a IoC container and dependency injection, but will be benefitial.

In the mean time, would you consider installing something like Glimpse and running it in production and sharing some of the results? That would give me more data to better focus my search.

voat commented 9 years ago

Hi @johnduhart, you are addressing the core of the problem we are facing right now. At the moment IIS is using quite a lot of CPU and this is the primary bottleneck, SQL server is holding up quite well. I will look into installing Glimpse and sharing the results today.

liamlaverty commented 9 years ago

IoC would be great. What we need is a place for people to write in their "low hanging fruit" though, for instance:

SubversesController.SfwSubmissionsFromAllSubversesByViews24Hours().

From what I can see every time someone loads the page you're recalculating this, it doesn't look to be cached anywhere (could be missing it somewhere else though(. Put it in a 5 minute cache (even a 30 second cache) and the whole site will speed up with a few lines of code. There's a number of things I've seen just browsing through that look like big red flags for performance gains.

Unfortunately I can't start the project locally for some reason, but re-evaluating the file Voat.Controllers.SubversesController bring significant performance gains.

voat commented 9 years ago

Thank you @liamlaverty, I'm sure there are many other areas with similar room for improvement and quick and easy fixes. Voat was mostly developed with assumption that it would never be used by so many people. If you notice any more red flags, please feel free to comment here (everyone else is also welcome to do so). It helps a lot.

liamlaverty commented 9 years ago

@voat You've done a great job so far, don't take my (or anyone else's) criticism as anything less than constructive. From the way my inbox has blown up you've got a community of developers willing to help out, the lowest hanging fruit would be to give them direction.

Best of luck to you both

0xn3bs commented 9 years ago

I have a suggestion that I'd be willing to help out with.

What if you used a micro-orm such as Dapper? EF has quite a bit of overhead. We should be able to still utilize the existing EF model classes as essentially POCO objects.

Again I'd be willing to help with this undertaking.

pcwiek commented 9 years ago

@inkadnb EF is fine (after a little bit of tuning) for non-critical paths. The only really annoying thing is first query ever after the AppDomain is created, which sets up all internals of EF and can be slow - the 'bigger' the context is, the longer it takes to set it up.

Otherwise, using something like Dapper, PetaPOCO or Insight.Database for the hot path queries should work just fine.

First priority, IMO, should be measurement, identification of slow queries and tuning/adding indexes.

0xn3bs commented 9 years ago

pcwiek I think it's not necessarily an issue with EF but rather the fact that everything is being lazy loaded, this causes a huge number of queries to the database when a single page loads.

I was focusing on comments yesterday but I'll do more profiling and see what else I can find.

pcwiek commented 9 years ago

@inkadnb Yeah, that sounds like a relatively common pitfall when using heavyweight ORM, it's a textbook SELECT N+1 problem. Fortunately, all it takes to alleviate it is to just eagerly load what you need, i.e. rewrite LINQ queries as if they were SQL queries: you grab only the things you need in the SELECT clause and you grab them up front. That should make the app run much faster.

I might look into it too in the next few days, it depends on my amount of free time (or lack thereof). :)

0xn3bs commented 9 years ago

@pcwiek Right, I'd argue that at that point you might as well write it as a SQL query to begin with and use Dapper or plain old ADO.NET. LINQ to SQL is more expensive than just writing out the SQL yourself.

The biggest draw-back with writing out the SQL yourself is dealing with schema changes. Going back and updating those query strings can be a pain since you lose compile-time type checking that you get with LINQ to SQL but that's a price you pay for performance (although there's ways around this).

pcwiek commented 9 years ago

@inkadnb

Going back and updating those query strings can be a pain since you lose compile-time type checking that you get with LINQ to SQL but that's a price you pay for performance (although there's ways around this)

That's exactly it, and personally I'm a proponent of having type safety and refactoring-friendly solution at a cost of relatively minor performance penalty unless it can be actually empirically shown that any particular action/controller is a hot path or a bottleneck.

Hence my suggestion to just tinker with EF, adjust LINQ queries so that there are no nasty surprises hidden there, adjust indexes in DB and then - if there's a need shown via profiling and measurements - introduce any micro-ORM of choice alongside that would be used in the performance-sensitive areas instead.

As a side note, with EF you can easily use Effort or SQLite driver for fast and easy 'integration' (or 'extended unit') tests. They won't replace end-to-end tests because of LINQ implementation differences between different DB providers, but even then I'd say that this approach would cover a lot of ground with minimal effort (bad pun, I admit).

But that's just my opinion. :)

0xn3bs commented 9 years ago

pcwiek, I think that's fair. The approach I mentioned was specifically targeted for places that are performance critical, in those areas we can re-factor and utilize a micro-ORM. I wasn't expecting EF to be ripped out entirely, at least not anytime soon. We don't need to since we can just treat existing EF objects as POCO.

I see both sides of the coin and from the perspective of "since I'm refactoring this anyways..." it makes sense to me. On the flip-side, keeping type-safety, keeping the scope of changes smaller, it also makes sense.

I'm fine with whatever decision is made, but I do think it needs to be officially addressed. Sticking with EF we'll need to establish some standard on avoiding large amounts of lazy-loading and over-utilization of EFs dot-notation.

I'm going to do more profiling tonight, it's difficult to accurately measure since my data-set is small. It would be nice if we had a larger test data-set or DB to work off of.

pcwiek commented 9 years ago

@inkadnb Agreed.

But this:

(... ) and over-utilization of EFs dot-notation.

is not problem per se, or at least I don't think it is. I think that the core issue here stems from slight misunderstanding of deferred execution and how LINQ-to-SQL providers really work.

Let's look here, for example.

  1. Case-insensitive comparison might be implemented by the provider, but it would be actually (IMHO) easier to rely on database collation to have that done right once and for all, so there's no way of 'slipping up'.
  2. Since none of the previous queries are materialized, lines 140 and 145 execute separate EXIST queries. Also note that there are two .Any() calls in a row, which means that the same functionality could probably be achieved in just one query.
  3. If there are not too many replies this part should work fine, but the loop will execute N updates. Once the reply count grows large that solution will get slow since EF sucks at batch operations, so there are two easy solutions:

    or

  4. Both EF and SQL Server support asynchronous processing - you need to add Asynchronous Processing=true to connection string. The project should take advantage of that by changing controller actions to return Task<ActionResult> and materializing query results at appropriate time by .ToListAsync or, for example, using SaveChangesAsync. It also should help with throughput once the more pressing performance issues are resolved.
  5. The count retrieval is bugging me. If they're all usually retrieved together, maybe it would be actually good to retrieve them all in one quick query. If there was a view specific for count retrieval, that query would be really simple, and views can be mapped in the exact same way as tables in EF. Right now the count retrieval alone in this controller is roughly... 6 counts x 1-2 queries per count retrieval = 6 to 12 queries, give or take, on every request.

Also it's my opinion, but instead of having ViewBag used everywhere, it would be probably easier to maintain it if there was a view model created for the action with all relevant data pieces in it, including counts.

Just some thoughts, take it all with a grain of salt. I might take a look at it over the next few days and start tinkering...

0xn3bs commented 9 years ago

Thank you @pcwiek that was very informative :)

My qualms with dot-notation is with deferred execution itself. As you know when a query expression is executed can vary but it's easy for developers to gloss over that detail, that's my main gripe, it's easy to fall into the trap of being ignorant of query execution. I prefer being more explicit precisely for that reason.

pcwiek commented 9 years ago

@inkadnb @voat @liamlaverty @johnduhart

I actually had a little bit of time today to take a look at what can be done here and here's a small sample: Karma revamp, along with two necessary interfaces: one and two.

I don't want to step on anyone's toes here, is anyone else actively working on this?

Also, I'm not sure if - for example - there's any progress on the 'adding IOC' front. If not, I think it should be done first, although it's up for debate which one to pick.

PuttItOut commented 8 years ago

Closed as this is not a planned change.