Precompute hash values for connector provided distributions

trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

https://trino.io

Apache License 2.0

10.22k stars 2.95k forks source link

Precompute hash values for connector provided distributions #14184

Open sopel39 opened 2 years ago

sopel39 commented 2 years ago

Currently, HashGenerationOptimizer.Rewriter#visitExchange has a constraint:

// Currently, precomputed hash values are only supported for system hash distributions without constants

It would be beneficial to precompute hash values for connector distributions too (e.g. buckets)

sopel39 commented 2 years ago

cc @gaurav8297 @raunaqmorarka

findepi commented 2 years ago

That would be beneficial if we use these hashes more than once. Indeed we do -- for distributing data between the nodes and then within the connector page sink. We maybe also distribute between page sinks on a node, if task writer count > 1?

Anyway, for reuse within connector page sink, what would be a proposed SPI to represent that?

sopel39 commented 2 years ago

We maybe also distribute between page sinks on a node, if task writer count > 1?

That's the idea

Anyway, for reuse within connector page sink, what would be a proposed SPI to represent that?

I don't think a new SPI is needed. I think we just need to use connector bucket function to compute hash (and possibly mangle the hash at remote exchange and local exchange level)

findepi commented 2 years ago

Without a SPI change we still will have hash/partitioning inside the connector. See example in Iceberg https://github.com/trinodb/trino/blob/6310e5e1415fe0bc89444016fc516640c3a3fcbb/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSink.java#L266 This should re-use the precomputed hash values.

sopel39 commented 2 years ago

Without a SPI change we still will have hash/partitioning inside the connector. See example in Iceberg

page indexer is a different matter as page indexer is local to sink (it enumerates "seen" partitions). It might consume hash, but the biggest cost in page indexer are hash lookups. There is another issue to improve page indexer, see: https://github.com/trinodb/trino/issues/14183