trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.18k stars 2.94k forks source link

Histogram remove inner classes #1040

Open rash67 opened 5 years ago

rash67 commented 5 years ago

Removes nested inner classes and replaces with instance method calls or static method calls where possible.

Other minor cleanup of stale comments was done, and a minor optimization in rehash was found avoiding an array lookup).

Overall, one should find that inner class methods were moved to instance/static methods with any members as params. The pattern was repeated, but the heart of the add() flow is the same, and can be followed down processAdd() instead of the old flow with bucketNode.

kokosing commented 5 years ago

What is the motivation behind this? Is related to performance or design (clean code)?

rash67 commented 4 years ago

hi, sorry I've been off for a long while and have just recently returned to this.

The primary motivation is it creates unnecessary objects which increases gc pressure. Previous profiling of object allocation showed as much as 4%. I re-ran some toy queries on tpch.customer such as:

select custkey, histogram(orderkey) from orders group by custkey;

and see 2% of objects allocated are ValueNode and BucketNode. It's not massive, and depends heavily on the query workload, but in the end, it's just unnecessary GC pressure.

Cleaner code could be a secondary one. imo, they made the code "cleaner" (just how I think) as they "wrapped" locations into parallel arrays with an ephemeral object just for abstraction. It could just as well be argued it's too much abstraction and cleaner code without the objects.

It just turns out the jvm couldn't elide them. I can see it either way, though.

I'm ready with a PR anytime if this is still deemed worthwhile