[GPU Logic Bug] SELECT <agg_func> FROM <table> JOIN <table> WHERE <condition> Brings GPU Logic Bug

Describe:

SELECT \ FROM \

JOIN \

WHERE \ brings different results, when using CPU-only configurations and GPU-used configurations.

SQL with CPU-only Config:

CREATE TABLE t2(c0 REAL);
CREATE TABLE t4(LIKE t2);
INSERT INTO t2(c0) VALUES(2.5E8);
INSERT INTO t2(c0) VALUES(-1.1E9);
INSERT INTO t2(c0) VALUES(0.3), (0.12848);
INSERT INTO t4(c0) VALUES(7.1E8);
CREATE SCHEMA extensions;
CREATE EXTENSION pg_strom WITH SCHEMA extensions;
SET pg_strom.enabled=off;
SELECT AVG(t2.c0) FROM t2 CROSS JOIN t4 WHERE ((t2.c0)IS DISTINCT FROM(t4.c0));

Result:

         avg         
---------------------
 -212499999.89288002
(1 row)

SQL with GPU-used Config:

BEGIN;
SET LOCAL pg_strom.enabled=on; 
SET LOCAL pg_strom.enable_gpuscan=on;
SET LOCAL pg_strom.enable_gpuhashjoin=on; 
SET LOCAL pg_strom.enable_gpujoin=on; 
SET LOCAL pg_strom.enable_gpupreagg=on; 
SELECT AVG(t2.c0) FROM t2 CROSS JOIN t4 WHERE ((t2.c0)IS DISTINCT FROM(t4.c0));
COMMIT;

Result:

       avg        
------------------
 -212499999.89288
(1 row)

Environment:

Pg-strom Version: commit 9765660eba4fab7e4e9378c667e6bd565da83ffd

PostgreSQL Version: 15.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-20), 64-bit

CUDA Version: 12.2

NVIDIA Driver Version: 535.171.04

kaigai commented 2 weeks ago

This is calculation error in floating-point values, and PG-Strom returns more precise value.

See, the following results. The AVG(c0) is sum of the values divided by 4.

hoge=# SELECT t2.c0 FROM t2 CROSS JOIN t4 WHERE ((t2.c0)IS DISTINCT FROM(t4.c0));
    c0
----------
  2.5e+08
 -1.1e+09
      0.3
  0.12848
(4 rows)

Try to calculate them with explicit cast to numeric (that will never have calculation errors in floating-point):

hoge=# select (250000000::numeric - 1100000000::numeric + 0.3::numeric + 0.12848::numeric ) / 4::numeric;
      ?column?
---------------------
 -212499999.89288000
(1 row)

As you may know, FP value may cause calculation errors when we operate a big value and a little value. In this case, when float4_accum() handled the third value, it adds 0.3 on the intermediation value -850000000.

On the other hands, PG-Strom's logic makes this problem a bit mild, because __update_nogroups__psum_fp() calculates value[0] + value[1] and value[2] + value[3] first in parallel, then calculates the both results and store the result->sum field.

This often happens in calculation of very large number of items.

heterodb / pg-strom