mezuro / kalibro_processor

Reimplementation of Kalibro processing
GNU Affero General Public License v3.0
2 stars 10 forks source link

Slow processings #207

Closed rafamanzo closed 8 years ago

rafamanzo commented 8 years ago

Running the following on mezuro.org console gives us an hint of why some processings take so long

=> ProcessTime.all.group('state').average('time').map { |k,v| {k => v.to_f/60.0} }

=> [
  {"PREPARING"=>0.0021892476280869332},
  {"CHECKING"=>2.1039809739626834},
  {"AGGREGATING"=>29.408375646260332},
  {"BUILDING"=>0.919068727626195},
  {"INTERPRETING"=>1.7958658936723333},
  {"DOWNLOADING"=>0.20159135590306834},
  {"CALCULATING"=>3.2441646428687334},
  {"COLLECTING"=>4.56774077964555}
]

It means that on average each Processing takes approximately 30min aggregating.

By looking at the database machine processor we can see it at 100% full time during aggregation.

I suggest:

rafamanzo commented 8 years ago

Some work for producing performance tests has been started at the following branch: https://github.com/mezuro/kalibro_processor/tree/aggregation_performance

rafamanzo commented 8 years ago

I've added tests to Aggregator with similar complexity to the one found at mezuro.org: https://github.com/mezuro/kalibro_processor/commit/ced0eeef8d2a5cfe4d2d5c15c22492b2e4a12157

But the results are still not close to be that slow with an mean run time of 0.55s.

Insights are welcome @mezuro/core :smile:

danielkza commented 8 years ago

We'd probably need to test with a high concurrency level if the database itself is being the bottleneck. I think profiling the queries through PostgreSQL would be the easier way to understand the issues - we can look at the query plans and query times using something like pgBadger.

rafamanzo commented 8 years ago

On mezuro.org there is a single worker generating workloads to the database, so I don't believe we'll replicate the issue with concurrency. But I've tested with a single MetricResult. Perhaps it is worth to increase its count on the test as mezuro.org issues happen with configurations holding at least half dozen metrics.

About pgBadger looks like a nice tool to use once we can produce a test that replicates the issues found at mezuro.org.

danielkza commented 8 years ago

I thought we had four workers for kalibro_processor, my mistake!

rafamanzo commented 8 years ago

I've fixed the perfomance test. It was fast because I missed proper data initialization.

Now I have the following for a tree with height 10 and 2 children per node, so 512 leafs. Then I've produced the following:

Metric Count Wall Time (s) Process Time (s)
1 8.672761964797974 7.54611889
2 14.596535205841064 24.3851091414
4 77.55368666648864 127.275106022
8 256.8674069881439 556.1682938326

Running with 8 metrics still not produced the 100% usage by postgresql seen at mezuro.org. The plot looks quadratic.

rafamanzo commented 8 years ago

Possible cause: https://github.com/mezuro/kalibro_processor/blob/master/lib/processor/aggregator.rb#L23. @danielkza pointed out that the MetricResults table has no indexation which make the repetition of such query really slow.

diegoamc commented 8 years ago

I've run the same tests as @rafamanzo and have got the following results:

Metric Count Wall Time (s) Process Time (s)
1 6.992499828338623 7.272484427600001
2 13.529412364959716 12.9735503228
4 26.20373945236206 28.432685047400003
8 62.70585794448853 64.16384417580001

Then, I've created indexes on the columns module_result_id and metric_configuration_id of table metric_results and rerun the same tests. The following are the results for them:

Metric Count Wall Time (s) Process Time (s)
1 6.903397846221924 7.131537662599999
2 13.142084169387818 13.2359803324
4 28.782897806167604 28.8164754306
8 64.86161379814148 56.7507790644

From these results, I don't think the lack of indexes are the main problem for such sluggish processings. However, I also have some doubts about my tests. First, my results are much faster than @rafamanzo's. What could be the reasons for that? Just processing power?

Another thing, could the metrics used affect the results? I've commented the metrics I wouldn't use on the test from bottom to top (given the order on the performance script). Is this how you did too, @rafamanzo?

Also, it is recommended to someone else corroborate these results. I've commited the changes and pushed a new branch called metric_result_indexes. The commit goes with the modified database schema, so a single rollback is needed to return the database to its previous state (without the indexes).

Finally, if the results are corroborated, I think the reason for the indexes to not affect the performance is that we make to many insertions on MetricResult's table. Note, we are using a b-tree as the index structure. On those structures, searches are greatly improved. The searching algorithm goes from O(n) to O[t*log(n)], where t is the number os elements in each node of a b-tree. However, insertions become slower. The insertion algorithm goes from O(1) to O[log(n)]. Therefore, making a lot of insertions on the table, which is our case (on the script setup, with 8 metrics, we create 8184 metric results), is not recommended. That is because the time gained by a faster searching may be compensated by a slower insertion. In my opinion, that is what the results are showing. We can only conclude that the sluggish performance of the aggregation state is not due to lack of indexes on the MetricResult's table. Reference for b-tree's algorithms and time complexities I used here.

What do you think?

rafamanzo commented 8 years ago

From branch metric_result_indexes I've got the following on the same hardware as the previous tests:

Metric Count Wall Time (s) Process Time (s)
1 7.591419649124146 12.0535053682
2 31.928905773162843 39.1385960514
4 74.23858404159546 140.50534778839997
8 247.94011964797974 535.5620568106

Starting from 4 metrics one CPU have become pinned at 100% usage. But not by postgres. The aggregation itself was using 100% of one CPU.

Looks like this test is stressing some other performance issue rather than the missing indexes that we need to address before setting those up. But I believe they will play an important role afterwards on the large database we have at mezuro.org. They actually look to have eased some load on my machine starting at 4 metrics.

rafamanzo commented 8 years ago

Adding the following snippet at the end of Performance::Base#print:

File.open('profile_pt.html', 'w') { |file| RubyProf::CallStackPrinter.new(@results['Process Time'].first).print(file) }

I was able to narrow 51% of the process time to TreeMetricResult#descendant_values which has the following implementation:

def descendant_values
  module_result.children.map { |child|
    metric_result = child.tree_metric_result_for(self.metric)
    metric_result.value if metric_result
  }.compact
end

Which look really macarronic and inefficient. Then I replaced by:

def descendant_values
  self.class.
    where(module_result: module_result.children,
          metric_configuration_id: metric_configuration_id).
    select(:value).map(&:value)
end

But the execution time for the test increased :sob:

If someone else can get to the same result, I'd say we have to write some performance tests for this method as well. Can someone validate those results? What do you think about a test focusing on this?

danielkza commented 8 years ago

@rafamanzo Did you measure wall time also? If we're removing database waits the process time might not change that much, but the wall time might.

rafamanzo commented 8 years ago

The wall time increased as well :disappointed:

rafamanzo commented 8 years ago

Looking at same report format for wall time

File.open('profile_wt.html', 'w') { |file| RubyProf::CallStackPrinter.new(@results['Wall Time'].first).print(file) }

For the code with the new descendant_values implementation, ActiveRecord::Transactions#save takes 54% of the time. I'm experimenting with ActiveRecord::Base.transaction to see if we can speed the save by making bigger transactions.

rafamanzo commented 8 years ago

Little improvement, but still slower than base. I'm taking out now, but if anyone gets in the mood I'd give a try to https://github.com/zdennis/activerecord-import.

If we can iterate through the modules tree height, we could transform 1022 insertions (for the 2 metrics case) into 10 insertions.

danielkza commented 8 years ago

Can you try with this versions of descendant_values? It should avoid the extra query to fetch the children module_results at the start.

  def descendant_values
    self.class.
      joins(:module_result).
      where('module_results.parent_id' => module_result_id).
      pluck(:value)
  end
danielkza commented 8 years ago

I get some good speedup with my version (and database indexes). Old:

WALL_TIME:
  198.24422693252563
  205.54687118530273
  223.26766705513
  Average: 209.01958839098612

New:

WALL_TIME:
  93.48759293556213
  98.39741396903992
  101.10500907897949
  Average: 97.66333866119385

edit: got it to 12s now by walking the tree in whole "levels" (instead of just children of one parent) and doing bulk inserts.

rafamanzo commented 8 years ago

@danielkza I've tested here your version for descendant_results (with no indexes) on the same hardware as before and I've got the following for the 2 metrics case:

Performance::Aggregation
----------
* Process Time: 17.7611858672
* Wall Time: 17.437276744842528

Which is a slight slowdown from 14s previously (and now with no load on the machine running the tests I've got 12s). I believe you are running for the 8 metrics case, right? It is a small setback for the smaller cases and a huge improvement for both run time for large metrics sets and even code readability.

Thanks for the help. If you are up to make those changes into a PR, I'd ask you in advance to split them into one for each modification and be careful with the commit sizes. Otherwise I'll get to it probably on Friday afternoon and I'd be happy if you'd review my PR then :smile:

Indeed a nice job. Congratulations.

danielkza commented 8 years ago

I'll probably make 2 or 3 PRs. I have some improvements for the perf script, the indexes, refactoring of the aggregation, and possibly one for the compound metrics calculation (I noticed it was walking the metrics tree regardless of whether there are any actual compound metrics being used).

rafamanzo commented 8 years ago

Those following this issue, have a look at https://github.com/mezuro/kalibro_processor/pull/214 and https://github.com/mezuro/kalibro_processor/pull/215.

rafamanzo commented 8 years ago

Closing as the Aggregation performance has been successfully addressed and further investigation for other performance improvements have been extracted into separated issues.