Open lipzhu opened 2 years ago
Wow, amazing @lipzhu, welcome to the community!
Would you mind sharing the change via PR, or explaining specific changes in this thread?
@wey-gu, sorry for the later response, we just reproduce the performance increase in master by enabling the AutoFDO. And the best result is found in FindShortestPath scenario, it can increase 10+%. In general it has 2 major steps:
References: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45290.pdf
I can draft a document to describe the details to enable the AutoFDO for Nebula.
Dear @lipzhu ,
I am impressed by AutoFDO's performance gain, and especially, by your efforts/work on this.
It's really awesome, plz allow my silly question, regarding the perf data(gcov file), which I think would be crucial to be prepared(on which scenarios), is it possible that certain other scenarios could suffer from performance regression on this optimization? Or, it's considered as a general optimization(that is, we should expect no perf regression for all types of queries).
And, looking forward to your documentation, could we publish it to the nebulagraph blog when it's done?(also, if possible, we would like to invite you to the community meeting for sharing things about this work, too)
Thanks!
Hi @wey-gu ,
is it possible that certain other scenarios could suffer from performance regression on this optimization?
Maybe, but I rarely saw this situation, some scenarios didn't benefit from the AutoFDO binary. And Google have a paper(I paste in previous comments) to describe the situation(e.g. they collect profile data from a AutoFDO binary) they saw the regressions, and the impact is small compared with the performance gain, this maybe the reason they widely use the AutoFDO in their products.
@lipzhu, would you mind reaching us via mail(with an address that we could send to) so that we could share with you some contributor souvenirs and the certificate?
wey.gu(at)vesoft.com cc @lisahui
Thanks!
Introduction Hi Nebula Graph dev team,
We are working on the performance evaluation of Nebula Graph in Intel's ICX server, and found the metrics of front-end bound is high(LIB), for example, the Go2Step scenarior, the front-end bound of storaged process is ~32% and even more higher. Then we did some experiments for the front-end bound issue, like enabling AutoFDO, and found the Go2Step beanchmark performance can increase ~7%(Both the avg/P(95) latency). The next, we are going to apply the AutoFDO to all scenariors in nebula bench, do you have comments or suggestions ? P.S. The test result is based on the version of release-3.1 branch of Nebula Graph in Intel(R) Xeon(R) Platinum 8380 CPU.