heavyai / heavydb

HeavyDB (formerly OmniSciDB)
https://heavy.ai
Apache License 2.0
2.96k stars 448 forks source link

Evaluate using Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO) #818

Open zamazan4ik opened 12 months ago

zamazan4ik commented 12 months ago

Hi!

Recently I did many Profile-Guided Optimization (PGO) benchmarks on multiple projects (including many databases like PostgreSQL, ClickHouse, Redis, MongoDB, etc.) - the results are available here, database-related results could be checked here. That's why I think it's worth trying to apply PGO to HeavyDB to improve the database performance.

I can suggest the following things to do:

Here are some examples of how PGO is already integrated into other projects' build scripts:

Here are some examples how PGO-related documentation could look like in the project:

After PGO, I can suggest evaluating PLO with LLVM BOLT as an additional optimization step after PGO.

Below are listed some BOLT results:

I am not familiar with HeavyDB (yet) but I guess at first we can try to train PGO on the HeavyDB benchmarks and then compare before and after PGO performance with HeavyDB.

cdessanti commented 11 months ago

Hi @zamazan4ik,

I'm sorry for the late response. Each time I tried to reply to this message, I've been distracted by something else.

I have carefully reviewed your work and, in particular, the project you completed using ClickHouse. I found it to be very interesting. I just wanted to let you know that our database employs the GPU as an accelerator for operations that require high bandwidth and parallelism. Specifically, it handles data aggregations, filters and joins, and generates LLVM code to optimize these operations. This can also be done for CPU execution. As a result, the CPU is mainly responsible for coordination and memory management.

I haven't had the chance to use PGO on the project yet due to time constraints. However, if you're interested in running benchmarks with HeavyDB, I'd be happy to guide you through the process of setting up a development environment, including DDLs and data for standard or internal benchmarking purposes.

Let me know how I can assist you. Meanwhile, have a nice weekend. Candido