prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
15.85k stars 5.32k forks source link

TDigest internal check on double equal needs improvements #23139

Open lengliu opened 1 month ago

lengliu commented 1 month ago

In TDigest.java: https://github.com/prestodb/presto/blob/543e46c356dc22625b0d244ca1c3b5a9748c4b0b/presto-main/src/main/java/com/facebook/presto/tdigest/TDigest.java#L357

The double comparison only compares the absolute difference, which could be too restrictive for large values.

In general, we cannot use a simple abs(diff) < epsilon to compare two doubles because the absolute error in large number case could be large and yet the relative error is still small. See this SOF thread for correct way to compare. Especially Knuth’s method: https://stackoverflow.com/questions/17333/how-do-you-compare-float-and-double-while-accounting-for-precision-loss

Repro case would be when TDigest to be merged both containing large weights like 1E14.

kaikalur commented 1 month ago

CC: @tdcmeehan, @rschlussel

kaikalur commented 1 month ago

We are seeing some issues due to this check failure in our production workloads so it will be good to fix it.

kaikalur commented 1 month ago

CC: @feilong-liu