Closed rafamanzo closed 8 years ago
Some work for producing performance tests has been started at the following branch: https://github.com/mezuro/kalibro_processor/tree/aggregation_performance
I've added tests to Aggregator
with similar complexity to the one found at mezuro.org: https://github.com/mezuro/kalibro_processor/commit/ced0eeef8d2a5cfe4d2d5c15c22492b2e4a12157
But the results are still not close to be that slow with an mean run time of 0.55s.
Insights are welcome @mezuro/core :smile:
We'd probably need to test with a high concurrency level if the database itself is being the bottleneck. I think profiling the queries through PostgreSQL would be the easier way to understand the issues - we can look at the query plans and query times using something like pgBadger.
On mezuro.org there is a single worker generating workloads to the database, so I don't believe we'll replicate the issue with concurrency. But I've tested with a single MetricResult
. Perhaps it is worth to increase its count on the test as mezuro.org issues happen with configurations holding at least half dozen metrics.
About pgBadger looks like a nice tool to use once we can produce a test that replicates the issues found at mezuro.org.
I thought we had four workers for kalibro_processor, my mistake!
I've fixed the perfomance test. It was fast because I missed proper data initialization.
Now I have the following for a tree with height 10 and 2 children per node, so 512 leafs. Then I've produced the following:
Metric Count | Wall Time (s) | Process Time (s) |
---|---|---|
1 | 8.672761964797974 | 7.54611889 |
2 | 14.596535205841064 | 24.3851091414 |
4 | 77.55368666648864 | 127.275106022 |
8 | 256.8674069881439 | 556.1682938326 |
Running with 8 metrics still not produced the 100% usage by postgresql seen at mezuro.org. The plot looks quadratic.
Possible cause: https://github.com/mezuro/kalibro_processor/blob/master/lib/processor/aggregator.rb#L23. @danielkza pointed out that the MetricResult
s table has no indexation which make the repetition of such query really slow.
I've run the same tests as @rafamanzo and have got the following results:
Metric Count | Wall Time (s) | Process Time (s) |
---|---|---|
1 | 6.992499828338623 | 7.272484427600001 |
2 | 13.529412364959716 | 12.9735503228 |
4 | 26.20373945236206 | 28.432685047400003 |
8 | 62.70585794448853 | 64.16384417580001 |
Then, I've created indexes on the columns module_result_id
and metric_configuration_id
of table metric_results
and rerun the same tests. The following are the results for them:
Metric Count | Wall Time (s) | Process Time (s) |
---|---|---|
1 | 6.903397846221924 | 7.131537662599999 |
2 | 13.142084169387818 | 13.2359803324 |
4 | 28.782897806167604 | 28.8164754306 |
8 | 64.86161379814148 | 56.7507790644 |
From these results, I don't think the lack of indexes are the main problem for such sluggish processings. However, I also have some doubts about my tests. First, my results are much faster than @rafamanzo's. What could be the reasons for that? Just processing power?
Another thing, could the metrics used affect the results? I've commented the metrics I wouldn't use on the test from bottom to top (given the order on the performance script). Is this how you did too, @rafamanzo?
Also, it is recommended to someone else corroborate these results. I've commited the changes and pushed a new branch called metric_result_indexes
. The commit goes with the modified database schema, so a single rollback is needed to return the database to its previous state (without the indexes).
Finally, if the results are corroborated, I think the reason for the indexes to not affect the performance is that we make to many insertions on MetricResult's table. Note, we are using a b-tree as the index structure. On those structures, searches are greatly improved. The searching algorithm goes from O(n)
to O[t*log(n)]
, where t
is the number os elements in each node of a b-tree. However, insertions become slower. The insertion algorithm goes from O(1)
to O[log(n)]
. Therefore, making a lot of insertions on the table, which is our case (on the script setup, with 8 metrics, we create 8184 metric results), is not recommended. That is because the time gained by a faster searching may be compensated by a slower insertion. In my opinion, that is what the results are showing. We can only conclude that the sluggish performance of the aggregation state is not due to lack of indexes on the MetricResult's table. Reference for b-tree's algorithms and time complexities I used here.
What do you think?
From branch metric_result_indexes
I've got the following on the same hardware as the previous tests:
Metric Count | Wall Time (s) | Process Time (s) |
---|---|---|
1 | 7.591419649124146 | 12.0535053682 |
2 | 31.928905773162843 | 39.1385960514 |
4 | 74.23858404159546 | 140.50534778839997 |
8 | 247.94011964797974 | 535.5620568106 |
Starting from 4 metrics one CPU have become pinned at 100% usage. But not by postgres. The aggregation itself was using 100% of one CPU.
Looks like this test is stressing some other performance issue rather than the missing indexes that we need to address before setting those up. But I believe they will play an important role afterwards on the large database we have at mezuro.org. They actually look to have eased some load on my machine starting at 4 metrics.
Adding the following snippet at the end of Performance::Base#print
:
File.open('profile_pt.html', 'w') { |file| RubyProf::CallStackPrinter.new(@results['Process Time'].first).print(file) }
I was able to narrow 51% of the process time to TreeMetricResult#descendant_values
which has the following implementation:
def descendant_values
module_result.children.map { |child|
metric_result = child.tree_metric_result_for(self.metric)
metric_result.value if metric_result
}.compact
end
Which look really macarronic and inefficient. Then I replaced by:
def descendant_values
self.class.
where(module_result: module_result.children,
metric_configuration_id: metric_configuration_id).
select(:value).map(&:value)
end
But the execution time for the test increased :sob:
If someone else can get to the same result, I'd say we have to write some performance tests for this method as well. Can someone validate those results? What do you think about a test focusing on this?
@rafamanzo Did you measure wall time also? If we're removing database waits the process time might not change that much, but the wall time might.
The wall time increased as well :disappointed:
Looking at same report format for wall time
File.open('profile_wt.html', 'w') { |file| RubyProf::CallStackPrinter.new(@results['Wall Time'].first).print(file) }
For the code with the new descendant_values
implementation, ActiveRecord::Transactions#save
takes 54% of the time. I'm experimenting with ActiveRecord::Base.transaction
to see if we can speed the save by making bigger transactions.
Little improvement, but still slower than base. I'm taking out now, but if anyone gets in the mood I'd give a try to https://github.com/zdennis/activerecord-import.
If we can iterate through the modules tree height, we could transform 1022 insertions (for the 2 metrics case) into 10 insertions.
Can you try with this versions of descendant_values
? It should avoid the extra query to fetch the children module_results at the start.
def descendant_values
self.class.
joins(:module_result).
where('module_results.parent_id' => module_result_id).
pluck(:value)
end
I get some good speedup with my version (and database indexes). Old:
WALL_TIME:
198.24422693252563
205.54687118530273
223.26766705513
Average: 209.01958839098612
New:
WALL_TIME:
93.48759293556213
98.39741396903992
101.10500907897949
Average: 97.66333866119385
edit: got it to 12s now by walking the tree in whole "levels" (instead of just children of one parent) and doing bulk inserts.
@danielkza I've tested here your version for descendant_results
(with no indexes) on the same hardware as before and I've got the following for the 2 metrics case:
Performance::Aggregation
----------
* Process Time: 17.7611858672
* Wall Time: 17.437276744842528
Which is a slight slowdown from 14s previously (and now with no load on the machine running the tests I've got 12s). I believe you are running for the 8 metrics case, right? It is a small setback for the smaller cases and a huge improvement for both run time for large metrics sets and even code readability.
Thanks for the help. If you are up to make those changes into a PR, I'd ask you in advance to split them into one for each modification and be careful with the commit sizes. Otherwise I'll get to it probably on Friday afternoon and I'd be happy if you'd review my PR then :smile:
Indeed a nice job. Congratulations.
I'll probably make 2 or 3 PRs. I have some improvements for the perf script, the indexes, refactoring of the aggregation, and possibly one for the compound metrics calculation (I noticed it was walking the metrics tree regardless of whether there are any actual compound metrics being used).
Those following this issue, have a look at https://github.com/mezuro/kalibro_processor/pull/214 and https://github.com/mezuro/kalibro_processor/pull/215.
Closing as the Aggregation performance has been successfully addressed and further investigation for other performance improvements have been extracted into separated issues.
Running the following on mezuro.org console gives us an hint of why some processings take so long
It means that on average each Processing takes approximately 30min aggregating.
By looking at the database machine processor we can see it at 100% full time during aggregation.
I suggest:
RubyProf::CallStackPrinter
TreeMetricResult
s bottom-upmetric_results
table may be worth the extra insertion cost. We need to investigate that once the bulk insertions are up.TreeMetricResult#descendant_values
performance