Closed df7cb closed 2 years ago
Interesting. I'm using rpi3, running the 32-bit Ubuntu, and I don't see such issues with float digits. But setting extra_float_digits seems like a good idea, so I'll get that committed and hope it fixes the issue.
As for the plan changes related to Memoize on PG14, I guess the best solution will adding an alternative expected file. I'll take care of that. I wonder why I haven't noticed earlier. I guess I've been using some older branch for development, or something like that.
The last bit at the end of the report was a bit puzzling:
-- mismatching count
SELECT 'flags 0 count 21 compression 10 centroids 8 (1000.000000, 1) (2000.000000, 1) (7000.000000, 2) (26000.000000, 4) (84000.000000, 7) (51000.000000, 3) (19000.000000, 1) (20000.000000, 1)'::tdigest;
-ERROR: total count does not match the data (20 != 21)
+ERROR: total count does not match the data (20 != 0)
LINE 1: SELECT 'flags 0 count 21 compression 10 centroids 8 (1000.00...
^
That seems like a bug in parsing the input, but in fact it seems to be caused by incorrect format string, using %ld
instead of %lld
. I'll get that fixed too.
I've pushed fixes addressing (hopefully) all those issues, it's the 1.3.0 release.
Thanks for the quick response!
Most of the regression diff is now gone on i386, but there's differences left:
diff -U3 /home/myon/projects/postgresql/tdigest/tdigest.git/test/expected/tdigest_1.out /home/myon/projects/postgresql/tdigest/tdigest.git/results/tdigest.out
--- /home/myon/projects/postgresql/tdigest/tdigest.git/test/expected/tdigest_1.out 2021-11-08 21:20:44.957202024 +0100
+++ /home/myon/projects/postgresql/tdigest/tdigest.git/results/tdigest.out 2021-11-08 21:23:52.339584820 +0100
@@ -1318,21 +1318,21 @@
-- test casting to double precision array
SELECT cast(tdigest(i / 1000.0, 10) as double precision[]) from generate_series(1,1000) s(i);
- tdigest
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
- {1,1000,10,13,0.001,1,0.002,1,0.0045,4,0.013000000000000001,13,0.0405,42,0.13499999999999998,147,0.46400000000000013,511,0.7929999999999997,147,0.9159999999999999,99,0.9795,28,0.9959999999999999,5,0.999,1,1,1}
+ tdigest
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ {1,1000,10,13,0.001,1,0.002,1,0.0045000000000000005,4,0.013000000000000001,13,0.040500000000000015,42,0.13499999999999998,147,0.4639999999999998,511,0.7929999999999998,147,0.9159999999999999,99,0.9795,28,0.9960000000000001,5,0.999,1,1,1}
(1 row)
SELECT cast(tdigest(i / 1000.0, 25) as double precision[]) from generate_series(1,1000) s(i);
- tdigest
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
- {1,1000,25,18,0.001,1,0.002,1,0.003,1,0.0055000000000000005,4,0.011999999999999999,9,0.026500000000000013,20,0.05749999999999998,42,0.11500000000000002,73,0.23199999999999993,161,0.472,319,0.7270000000000004,191,0.8774999999999998,110,0.949,33,0.9764999999999998,22,0.9915,8,0.997,3,0.999,1,1,1}
+ tdigest
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ {1,1000,25,18,0.001,1,0.002,1,0.003,1,0.0055000000000000005,4,0.011999999999999999,9,0.026500000000000003,20,0.05749999999999998,42,0.11499999999999999,73,0.232,161,0.47200000000000003,319,0.7269999999999992,191,0.8774999999999996,110,0.9490000000000001,33,0.9765000000000004,22,0.9915,8,0.997,3,0.999,1,1,1}
(1 row)
SELECT cast(tdigest(i / 1000.0, 100) as double precision[]) from generate_series(1,1000) s(i);
- tdigest
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
- {1,1000,100,40,0.001,1,0.002,1,0.003,1,0.004,1,0.005,1,0.006,1,0.0075,2,0.01,3,0.013499999999999998,4,0.018,5,0.024499999999999997,8,0.03400000000000001,11,0.04700000000000001,15,0.065,21,0.09000000000000001,29,0.12450000000000001,40,0.17099999999999999,53,0.23149999999999998,68,0.3074999999999999,84,0.3984999999999999,98,0.5009999999999999,107,0.6035,98,0.6944999999999999,84,0.7705000000000001,68,0.831,53,0.8774999999999998,40,0.912,29,0.9369999999999999,21,0.955,15,0.968,11,0.9775,8,0.984,5,0.9885,4,0.992,3,0.9944999999999999,2,0.996,1,0.997,1,0.998,1,0.999,1,1,1}
+ tdigest
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ {1,1000,100,40,0.001,1,0.002,1,0.003,1,0.004,1,0.005,1,0.006,1,0.0075,2,0.01,3,0.013499999999999998,4,0.018,5,0.024500000000000004,8,0.033999999999999996,11,0.04700000000000001,15,0.065,21,0.08999999999999998,29,0.12449999999999997,40,0.17099999999999996,53,0.23149999999999998,68,0.3074999999999999,84,0.3985,98,0.5009999999999999,107,0.6034999999999998,98,0.6945000000000001,84,0.7705,68,0.8309999999999998,53,0.8774999999999998,40,0.9120000000000003,29,0.9369999999999999,21,0.955,15,0.968,11,0.9775,8,0.984,5,0.9884999999999999,4,0.992,3,0.9944999999999999,2,0.996,1,0.997,1,0.998,1,0.999,1,1,1}
(1 row)
-- <value,count> API
Unfortunately settting extra_float_digits=0
for the whole file (and using the result as tdigest_2.out) doesn't work, that leaves one line of diff between amd64 and i386:
diff -U3 /home/myon/projects/postgresql/tdigest/tdigest.git/test/expected/tdigest_2.out /home/myon/projects/postgresql/tdigest/tdigest.git/results/tdigest.out
--- /home/myon/projects/postgresql/tdigest/tdigest.git/test/expected/tdigest_2.out 2021-11-08 21:31:12.345180020 +0100
+++ /home/myon/projects/postgresql/tdigest/tdigest.git/results/tdigest.out 2021-11-08 21:33:11.894708082 +0100
@@ -1325,9 +1325,9 @@
(1 row)
SELECT cast(tdigest(i / 1000.0, 25) as double precision[]) from generate_series(1,1000) s(i);
- tdigest
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
- {1,1000,25,18,0.001,1,0.002,1,0.003,1,0.0055,4,0.012,9,0.0265,20,0.0575,42,0.115,73,0.232,161,0.472,319,0.727,191,0.8775,110,0.949,33,0.9765,22,0.9915,8,0.997,3,0.999,1,1,1}
+ tdigest
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ {1,1000,25,18,0.001,1,0.002,1,0.003,1,0.0055,4,0.012,9,0.0265,20,0.0575,42,0.115,73,0.232,161,0.472,319,0.726999999999999,191,0.8775,110,0.949,33,0.9765,22,0.9915,8,0.997,3,0.999,1,1,1}
(1 row)
SELECT cast(tdigest(i / 1000.0, 100) as double precision[]) from generate_series(1,1000) s(i);
Possibly avoiding problematic floating point numbers (like 0.727) would work to fix the tests?
Ah, I haven't realized the patch set extra_float_digits just for one query, setting it for the whole script seems like a good idea.
It's rather bizarre that all the values are just fine, except for 0.0726999... and that in the casts to json a couple lines before it works just fine. How is that possible? I wonder if that's some quirk in pg_strfromd
or the sprintf
on that system.
I've pushed two fixes, hopefully resolving this. The first one simply sets extra_float_digits for the whole file, the second rounds all the double precision explicitly.
I still think it's suspicious some of the values are printed differently on i386. I guess it's either due to some bug in sprintf
but it might be due to some differences in rounding errors on some systems.
Things are unfortunately not looking good. On i386 I still had to add more _2 _3 files to catch the output variation, but that wasn't enough -- arm64 and ppc64el produce yet different output: https://pgdgbuild.dus.dg-i.net/job/tdigest-binaries/14/
And the tests only cover PG 12+, the EXPLAIN tests fail on earlier versions.
I would suggest to put the precision/output tests into a different tests/sql/* file than the EXPLAIN tests - that way, varying _.out files don't have to cover the full tests but only the interesting bits.
On top of that, the PG10 test was even segfaulting on amd64... (I don't have logs anymore.)
Thanks for the testing. I've pushed a couple commits fixing issues on older releases (I hope), but 12 still needs a bit of work to fix the explain output. The way the tests are written now, it gets broken by a number of Postgres improvements - CTE materialization, memoize, and so on. I'll think about simplifying the tests to minimize this - either by moving the explains to a separate file, and/or eliminating some of them (there seem to be quite a few of them).
Not sure about the arm64/ppc64el results, because I have no way to test those (it seems fine on my rpi with both armv7l/aarch64). Can you attach the output from these machines? I tried locating them in the jenkins output, but can't find them.
PG14 arm64:
18:59:43 diff -U3 /<<PKGBUILDDIR>>/test/expected/tdigest_1.out /<<PKGBUILDDIR>>/results/tdigest.out
18:59:43 --- /<<PKGBUILDDIR>>/test/expected/tdigest_1.out 2021-11-08 19:49:23.000000000 +0000
18:59:43 +++ /<<PKGBUILDDIR>>/results/tdigest.out 2022-01-05 17:59:43.399344342 +0000
18:59:43 @@ -1318,15 +1318,15 @@
18:59:43
18:59:43 -- test casting to double precision array
18:59:43 SELECT cast(tdigest(i / 1000.0, 10) as double precision[]) from generate_series(1,1000) s(i);
18:59:43 - tdigest
18:59:43 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
18:59:43 - {1,1000,10,13,0.001,1,0.002,1,0.0045,4,0.013000000000000001,13,0.0405,42,0.13499999999999998,147,0.46400000000000013,511,0.7929999999999997,147,0.9159999999999999,99,0.9795,28,0.9959999999999999,5,0.999,1,1,1}
18:59:43 + tdigest
18:59:43 +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
18:59:43 + {1,1000,10,13,0.001,1,0.002,1,0.0045,4,0.013000000000000001,13,0.0405,42,0.13499999999999998,147,0.4640000000000001,511,0.7929999999999997,147,0.9159999999999999,99,0.9795,28,0.9959999999999999,5,0.999,1,1,1}
18:59:43 (1 row)
18:59:43
18:59:43 SELECT cast(tdigest(i / 1000.0, 25) as double precision[]) from generate_series(1,1000) s(i);
18:59:43 - tdigest
18:59:43 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
18:59:43 - {1,1000,25,18,0.001,1,0.002,1,0.003,1,0.0055000000000000005,4,0.011999999999999999,9,0.026500000000000013,20,0.05749999999999998,42,0.11500000000000002,73,0.23199999999999993,161,0.472,319,0.7270000000000004,191,0.8774999999999998,110,0.949,33,0.9764999999999998,22,0.9915,8,0.997,3,0.999,1,1,1}
18:59:43 + tdigest
18:59:43 +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
18:59:43 + {1,1000,25,18,0.001,1,0.002,1,0.003,1,0.0055000000000000005,4,0.011999999999999999,9,0.026500000000000013,20,0.05749999999999998,42,0.11500000000000002,73,0.23199999999999993,161,0.47200000000000003,319,0.7270000000000004,191,0.8774999999999998,110,0.949,33,0.9764999999999998,22,0.9915,8,0.997,3,0.999,1,1,1}
18:59:43 (1 row)
18:59:43
18:59:43 SELECT cast(tdigest(i / 1000.0, 100) as double precision[]) from generate_series(1,1000) s(i);
PG 14 ppc64el:
19:00:31 diff -U3 /<<PKGBUILDDIR>>/test/expected/tdigest_1.out /<<PKGBUILDDIR>>/results/tdigest.out
19:00:31 --- /<<PKGBUILDDIR>>/test/expected/tdigest_1.out 2021-11-08 19:49:23.000000000 +0000
19:00:31 +++ /<<PKGBUILDDIR>>/results/tdigest.out 2022-01-05 18:00:31.661852483 +0000
19:00:31 @@ -1318,15 +1318,15 @@
19:00:31
19:00:31 -- test casting to double precision array
19:00:31 SELECT cast(tdigest(i / 1000.0, 10) as double precision[]) from generate_series(1,1000) s(i);
19:00:31 - tdigest
19:00:31 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
19:00:31 - {1,1000,10,13,0.001,1,0.002,1,0.0045,4,0.013000000000000001,13,0.0405,42,0.13499999999999998,147,0.46400000000000013,511,0.7929999999999997,147,0.9159999999999999,99,0.9795,28,0.9959999999999999,5,0.999,1,1,1}
19:00:31 + tdigest
19:00:31 +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
19:00:31 + {1,1000,10,13,0.001,1,0.002,1,0.0045,4,0.013000000000000001,13,0.0405,42,0.13499999999999998,147,0.4640000000000001,511,0.7929999999999997,147,0.9159999999999999,99,0.9795,28,0.9959999999999999,5,0.999,1,1,1}
19:00:31 (1 row)
19:00:31
19:00:31 SELECT cast(tdigest(i / 1000.0, 25) as double precision[]) from generate_series(1,1000) s(i);
19:00:31 - tdigest
19:00:31 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
19:00:31 - {1,1000,25,18,0.001,1,0.002,1,0.003,1,0.0055000000000000005,4,0.011999999999999999,9,0.026500000000000013,20,0.05749999999999998,42,0.11500000000000002,73,0.23199999999999993,161,0.472,319,0.7270000000000004,191,0.8774999999999998,110,0.949,33,0.9764999999999998,22,0.9915,8,0.997,3,0.999,1,1,1}
19:00:31 + tdigest
19:00:31 +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
19:00:31 + {1,1000,25,18,0.001,1,0.002,1,0.003,1,0.0055000000000000005,4,0.011999999999999999,9,0.026500000000000013,20,0.05749999999999998,42,0.11500000000000002,73,0.23199999999999993,161,0.47200000000000003,319,0.7270000000000004,191,0.8774999999999998,110,0.949,33,0.9764999999999998,22,0.9915,8,0.997,3,0.999,1,1,1}
19:00:31 (1 row)
19:00:31
19:00:31 SELECT cast(tdigest(i / 1000.0, 100) as double precision[]) from generate_series(1,1000) s(i);
The PG 12+13 outputs look the same (at least from scrolling over them).
Just trying the tests on arm64 on HEAD with PG 9.6+:
9.6:
--- /home/myon/tdigest/tdigest.git/test/expected/tdigest_2.out 2022-01-07 11:42:58.246497160 +0100
+++ /home/myon/tdigest/tdigest.git/results/tdigest.out 2022-01-07 11:57:52.455930501 +0100
@@ -1554,6 +1554,7 @@
SET parallel_setup_cost = 0;
SET parallel_tuple_cost = 0;
SET min_parallel_table_scan_size = '1kB';
+ERROR: unrecognized configuration parameter "min_parallel_table_scan_size"
-- individual values
EXPLAIN (COSTS OFF)
WITH x AS (SELECT percentile_disc(0.95) WITHIN GROUP (ORDER BY v) AS p FROM t)
@@ -1565,21 +1566,17 @@
(SELECT p FROM x) AS a,
tdigest_percentile(v, 100, 0.95) AS b
FROM t) foo;
- QUERY PLAN
-------------------------------------------------
+ QUERY PLAN
+---------------------------------
Subquery Scan on foo
CTE x
-> Aggregate
- -> Gather
- Workers Planned: 2
- -> Parallel Seq Scan on t t_1
+ -> Seq Scan on t t_1
-> Aggregate
- InitPlan 2 (returns $2)
+ InitPlan 2 (returns $1)
-> CTE Scan on x
- -> Gather
- Workers Planned: 2
- -> Parallel Seq Scan on t
-(12 rows)
+ -> Seq Scan on t
+(8 rows)
WITH x AS (SELECT percentile_disc(0.95) WITHIN GROUP (ORDER BY v) AS p FROM t)
SELECT
@@ -1604,15 +1601,12 @@
0.95 AS a,
tdigest_percentile_of(v, 100, 950) AS b
FROM t) foo;
- QUERY PLAN
-------------------------------------------------
+ QUERY PLAN
+---------------------------
Subquery Scan on foo
- -> Finalize Aggregate
- -> Gather
- Workers Planned: 2
- -> Partial Aggregate
- -> Parallel Seq Scan on t
-(6 rows)
+ -> Aggregate
+ -> Seq Scan on t
+(3 rows)
SELECT
950,
@@ -1637,21 +1631,17 @@
(SELECT p FROM x) AS a,
tdigest_percentile(d, 0.95) AS b
FROM t2) foo;
- QUERY PLAN
---------------------------------------------
+ QUERY PLAN
+---------------------------------
Subquery Scan on foo
CTE x
-> Aggregate
- -> Gather
- Workers Planned: 2
- -> Parallel Seq Scan on t
+ -> Seq Scan on t
-> Aggregate
- InitPlan 2 (returns $2)
+ InitPlan 2 (returns $1)
-> CTE Scan on x
- -> Gather
- Workers Planned: 2
- -> Parallel Seq Scan on t2
-(12 rows)
+ -> Seq Scan on t2
+(8 rows)
WITH x AS (SELECT percentile_disc(0.95) WITHIN GROUP (ORDER BY v) AS p FROM t)
SELECT
@@ -1676,15 +1666,12 @@
0.95 AS a,
tdigest_percentile_of(d, 950) AS b
FROM t2) foo;
- QUERY PLAN
--------------------------------------------------
+ QUERY PLAN
+----------------------------
Subquery Scan on foo
- -> Finalize Aggregate
- -> Gather
- Workers Planned: 2
- -> Partial Aggregate
- -> Parallel Seq Scan on t2
-(6 rows)
+ -> Aggregate
+ -> Seq Scan on t2
+(3 rows)
SELECT
950,
@@ -1711,23 +1698,17 @@
unnest((SELECT p FROM x)) AS a,
unnest(tdigest_percentile(v, 100, ARRAY[0.0, 0.95, 0.99, 1.0])) AS b
FROM t) foo;
- QUERY PLAN
-------------------------------------------------------
+ QUERY PLAN
+---------------------------------
Subquery Scan on foo
CTE x
-> Aggregate
- -> Gather
- Workers Planned: 2
- -> Parallel Seq Scan on t t_1
- -> ProjectSet
- InitPlan 2 (returns $2)
+ -> Seq Scan on t t_1
+ -> Aggregate
+ InitPlan 2 (returns $1)
-> CTE Scan on x
- -> Finalize Aggregate
- -> Gather
- Workers Planned: 2
- -> Partial Aggregate
- -> Parallel Seq Scan on t
-(14 rows)
+ -> Seq Scan on t
+(8 rows)
WITH x AS (SELECT percentile_disc(ARRAY[0.0, 0.95, 0.99, 1.0]) WITHIN GROUP (ORDER BY v) AS p FROM t)
SELECT
@@ -1758,26 +1739,20 @@
unnest((SELECT p FROM x)) AS a,
unnest(tdigest_percentile_of(v, 100, ARRAY[950, 990])) AS b
FROM t) foo;
- QUERY PLAN
---------------------------------------------------------
+ QUERY PLAN
+-----------------------------------------
Subquery Scan on foo
CTE x
-> Aggregate
-> Function Scan on unnest f
SubPlan 1
-> Aggregate
- -> Gather
- Workers Planned: 2
- -> Parallel Seq Scan on t t_1
- -> ProjectSet
- InitPlan 3 (returns $3)
+ -> Seq Scan on t t_1
+ -> Aggregate
+ InitPlan 3 (returns $2)
-> CTE Scan on x
- -> Finalize Aggregate
- -> Gather
- Workers Planned: 2
- -> Partial Aggregate
- -> Parallel Seq Scan on t
-(17 rows)
+ -> Seq Scan on t
+(11 rows)
WITH x AS (SELECT array_agg((SELECT percent_rank(f) WITHIN GROUP (ORDER BY v) FROM t)) AS p FROM unnest(ARRAY[950, 990]) f)
SELECT
@@ -1806,23 +1781,17 @@
unnest((SELECT p FROM x)) AS a,
unnest(tdigest_percentile(d, ARRAY[0.0, 0.95, 0.99, 1.0])) AS b
FROM t2) foo;
- QUERY PLAN
--------------------------------------------------------
+ QUERY PLAN
+---------------------------------
Subquery Scan on foo
CTE x
-> Aggregate
- -> Gather
- Workers Planned: 2
- -> Parallel Seq Scan on t
- -> ProjectSet
- InitPlan 2 (returns $2)
+ -> Seq Scan on t
+ -> Aggregate
+ InitPlan 2 (returns $1)
-> CTE Scan on x
- -> Finalize Aggregate
- -> Gather
- Workers Planned: 2
- -> Partial Aggregate
- -> Parallel Seq Scan on t2
-(14 rows)
+ -> Seq Scan on t2
+(8 rows)
WITH x AS (SELECT percentile_disc(ARRAY[0.0, 0.95, 0.99, 1.0]) WITHIN GROUP (ORDER BY v) AS p FROM t)
SELECT
@@ -1853,26 +1822,20 @@
unnest((SELECT p FROM x)) AS a,
unnest(tdigest_percentile_of(d, ARRAY[950, 990])) AS b
FROM t2) foo;
- QUERY PLAN
--------------------------------------------------------
+ QUERY PLAN
+-----------------------------------------
Subquery Scan on foo
CTE x
-> Aggregate
-> Function Scan on unnest f
SubPlan 1
-> Aggregate
- -> Gather
- Workers Planned: 2
- -> Parallel Seq Scan on t
- -> ProjectSet
- InitPlan 3 (returns $3)
+ -> Seq Scan on t
+ -> Aggregate
+ InitPlan 3 (returns $2)
-> CTE Scan on x
- -> Finalize Aggregate
- -> Gather
- Workers Planned: 2
- -> Partial Aggregate
- -> Parallel Seq Scan on t2
-(17 rows)
+ -> Seq Scan on t2
+(11 rows)
WITH x AS (SELECT array_agg((SELECT percent_rank(f) WITHIN GROUP (ORDER BY v) FROM t)) AS p FROM unnest(ARRAY[950, 990]) f)
SELECT
@@ -1903,25 +1866,21 @@
(SELECT p FROM x) AS a,
tdigest_percentile(v, c, 100, 0.95) AS b
FROM t) foo;
- QUERY PLAN
-------------------------------------------------------
+ QUERY PLAN
+------------------------------------------------
Subquery Scan on foo
CTE d
- -> Gather
- Workers Planned: 2
- -> Nested Loop
- -> Parallel Seq Scan on t t_1
- -> Function Scan on generate_series
+ -> Nested Loop
+ -> Seq Scan on t t_1
+ -> Function Scan on generate_series
CTE x
-> Aggregate
-> CTE Scan on d
-> Aggregate
- InitPlan 3 (returns $4)
+ InitPlan 3 (returns $3)
-> CTE Scan on x
- -> Gather
- Workers Planned: 2
- -> Parallel Seq Scan on t
-(16 rows)
+ -> Seq Scan on t
+(12 rows)
WITH
d AS (SELECT t.* FROM t, LATERAL generate_series(1,t.c)),
@@ -1951,25 +1910,21 @@
(SELECT p FROM x) AS a,
tdigest_percentile_of(v, c, 100, 950) AS b
FROM t) foo;
- QUERY PLAN
-------------------------------------------------------
+ QUERY PLAN
+------------------------------------------------
Subquery Scan on foo
CTE d
- -> Gather
- Workers Planned: 2
- -> Nested Loop
- -> Parallel Seq Scan on t t_1
- -> Function Scan on generate_series
+ -> Nested Loop
+ -> Seq Scan on t t_1
+ -> Function Scan on generate_series
CTE x
-> Aggregate
-> CTE Scan on d
-> Aggregate
- InitPlan 3 (returns $4)
+ InitPlan 3 (returns $3)
-> CTE Scan on x
- -> Gather
- Workers Planned: 2
- -> Parallel Seq Scan on t
-(16 rows)
+ -> Seq Scan on t
+(12 rows)
WITH
d AS (SELECT t.* FROM t, LATERAL generate_series(1,t.c)),
@@ -2001,27 +1956,21 @@
unnest((SELECT p FROM x)) AS a,
unnest(tdigest_percentile(v, c, 100, ARRAY[0.0, 0.95, 0.99, 1.0])) AS b
FROM t) foo;
- QUERY PLAN
-------------------------------------------------------
+ QUERY PLAN
+------------------------------------------------
Subquery Scan on foo
CTE d
- -> Gather
- Workers Planned: 2
- -> Nested Loop
- -> Parallel Seq Scan on t t_1
- -> Function Scan on generate_series
+ -> Nested Loop
+ -> Seq Scan on t t_1
+ -> Function Scan on generate_series
CTE x
-> Aggregate
-> CTE Scan on d
- -> ProjectSet
- InitPlan 3 (returns $4)
+ -> Aggregate
+ InitPlan 3 (returns $3)
-> CTE Scan on x
- -> Finalize Aggregate
- -> Gather
- Workers Planned: 2
- -> Partial Aggregate
- -> Parallel Seq Scan on t
-(18 rows)
+ -> Seq Scan on t
+(12 rows)
WITH
d AS (SELECT t.* FROM t, LATERAL generate_series(1,t.c)),
@@ -2056,30 +2005,24 @@
unnest((select x.p from x)) AS a,
unnest(tdigest_percentile_of(v, c, 100, ARRAY[950, 990])) AS b
FROM t) foo;
- QUERY PLAN
-------------------------------------------------------
+ QUERY PLAN
+------------------------------------------------
Subquery Scan on foo
CTE d
- -> Gather
- Workers Planned: 2
- -> Nested Loop
- -> Parallel Seq Scan on t t_1
- -> Function Scan on generate_series
+ -> Nested Loop
+ -> Seq Scan on t t_1
+ -> Function Scan on generate_series
CTE x
-> Aggregate
-> Function Scan on unnest f
SubPlan 2
-> Aggregate
-> CTE Scan on d
- -> ProjectSet
- InitPlan 4 (returns $5)
+ -> Aggregate
+ InitPlan 4 (returns $4)
-> CTE Scan on x
- -> Finalize Aggregate
- -> Gather
- Workers Planned: 2
- -> Partial Aggregate
- -> Parallel Seq Scan on t
-(21 rows)
+ -> Seq Scan on t
+(15 rows)
WITH
d AS (SELECT t.* FROM t, LATERAL generate_series(1,t.c)),
======================================================================
PG 10: OK
PG 11:
--- /home/myon/tdigest/tdigest.git/test/expected/tdigest_2.out 2022-01-07 11:42:58.246497160 +0100
+++ /home/myon/tdigest/tdigest.git/results/tdigest.out 2022-01-07 11:58:59.676937525 +0100
@@ -1573,13 +1573,14 @@
-> Gather
Workers Planned: 2
-> Parallel Seq Scan on t t_1
- -> Aggregate
+ -> Finalize Aggregate
InitPlan 2 (returns $2)
-> CTE Scan on x
-> Gather
Workers Planned: 2
- -> Parallel Seq Scan on t
-(12 rows)
+ -> Partial Aggregate
+ -> Parallel Seq Scan on t
+(13 rows)
WITH x AS (SELECT percentile_disc(0.95) WITHIN GROUP (ORDER BY v) AS p FROM t)
SELECT
@@ -1637,21 +1638,22 @@
(SELECT p FROM x) AS a,
tdigest_percentile(d, 0.95) AS b
FROM t2) foo;
- QUERY PLAN
---------------------------------------------
+ QUERY PLAN
+-------------------------------------------------
Subquery Scan on foo
CTE x
-> Aggregate
-> Gather
Workers Planned: 2
-> Parallel Seq Scan on t
- -> Aggregate
+ -> Finalize Aggregate
InitPlan 2 (returns $2)
-> CTE Scan on x
-> Gather
Workers Planned: 2
- -> Parallel Seq Scan on t2
-(12 rows)
+ -> Partial Aggregate
+ -> Parallel Seq Scan on t2
+(13 rows)
WITH x AS (SELECT percentile_disc(0.95) WITHIN GROUP (ORDER BY v) AS p FROM t)
SELECT
@@ -1915,13 +1917,14 @@
CTE x
-> Aggregate
-> CTE Scan on d
- -> Aggregate
+ -> Finalize Aggregate
InitPlan 3 (returns $4)
-> CTE Scan on x
-> Gather
Workers Planned: 2
- -> Parallel Seq Scan on t
-(16 rows)
+ -> Partial Aggregate
+ -> Parallel Seq Scan on t
+(17 rows)
WITH
d AS (SELECT t.* FROM t, LATERAL generate_series(1,t.c)),
@@ -1963,13 +1966,14 @@
CTE x
-> Aggregate
-> CTE Scan on d
- -> Aggregate
+ -> Finalize Aggregate
InitPlan 3 (returns $4)
-> CTE Scan on x
-> Gather
Workers Planned: 2
- -> Parallel Seq Scan on t
-(16 rows)
+ -> Partial Aggregate
+ -> Parallel Seq Scan on t
+(17 rows)
WITH
d AS (SELECT t.* FROM t, LATERAL generate_series(1,t.c)),
======================================================================
PG 12 13: OK
PG 14:
--- /home/myon/tdigest/tdigest.git/test/expected/tdigest_1.out 2022-01-07 11:40:38.928398940 +0100
+++ /home/myon/tdigest/tdigest.git/results/tdigest.out 2022-01-07 12:00:17.318100651 +0100
@@ -1905,13 +1905,12 @@
-> Parallel Seq Scan on t t_1
-> Memoize
Cache Key: t_1.c
- Cache Mode: binary
-> Function Scan on generate_series
-> Gather
Workers Planned: 2
-> Partial Aggregate
-> Parallel Seq Scan on t
-(16 rows)
+(15 rows)
WITH
d AS (SELECT t.* FROM t, LATERAL generate_series(1,t.c)),
@@ -1953,13 +1952,12 @@
-> Parallel Seq Scan on t t_1
-> Memoize
Cache Key: t_1.c
- Cache Mode: binary
-> Function Scan on generate_series
-> Gather
Workers Planned: 2
-> Partial Aggregate
-> Parallel Seq Scan on t
-(16 rows)
+(15 rows)
WITH
d AS (SELECT t.* FROM t, LATERAL generate_series(1,t.c)),
@@ -2003,14 +2001,13 @@
-> Parallel Seq Scan on t t_1
-> Memoize
Cache Key: t_1.c
- Cache Mode: binary
-> Function Scan on generate_series
-> Finalize Aggregate
-> Gather
Workers Planned: 2
-> Partial Aggregate
-> Parallel Seq Scan on t
-(17 rows)
+(16 rows)
WITH
d AS (SELECT t.* FROM t, LATERAL generate_series(1,t.c)),
@@ -2060,14 +2057,13 @@
-> Parallel Seq Scan on t t_1
-> Memoize
Cache Key: t_1.c
- Cache Mode: binary
-> Function Scan on generate_series
-> Finalize Aggregate
-> Gather
Workers Planned: 2
-> Partial Aggregate
-> Parallel Seq Scan on t
-(20 rows)
+(19 rows)
WITH
d AS (SELECT t.* FROM t, LATERAL generate_series(1,t.c)),
... the "cache mode" change is for PG15 only I think.
The output on ppc64el looks the same, so there are no precision differences left, only plan changes.
Fwiw since it's not really possible to put comments into .out files, one idea might be to put this at the beginning:
# select case
when setting::int between 120000 and 149999 then 'PG 12/13/14'
when setting::int >= 150000 then 'PG 15+'
else (setting::int / 10000)::text
end as "This file is for PG version"
from pg_settings where name = 'server_version_num';
This file is for PG version
─────────────────────────────
PG 12/13/14
Thanks. I don't understand why PG14 fails - the "Cache Mode: binary" should be there too, it was backpatched (see commit 6c32c0977783fae217b5eaa1d22d26c96e5b0085).
Not sure about 9.6 - isn't that already EOL? Or do we want to make packages available anyway?
So, I've pushed c08d21cc95f383afdde9a2582a3b908f27480b23 which splits the regression tests into smaller parts, and eliminates some of the differences by tweaking GUC etc. But it's impossible to eliminate all of that, so it also adds alternative output for some of those parts up to pg9.6. I've tested this on x86_64 and 32/64-bit rpi. I'll see if I can test the i386/ppc on qemu, or something.
As for the errors with floating point precision in output, I've realized the report you sent does this:
SELECT cast(tdigest(i / 1000.0, 10) as double precision[]) from generate_series(1,1000) s(i);
But that's not what the current HEAD does - it does
SELECT array_agg(round(v::numeric,3)) FROM (
SELECT unnest(cast(tdigest(i / 1000.0, 10) as double precision[])) AS v from generate_series(1,1000) s(i)
) foo;
exactly to make the output more stable. So this seems these results are from some older version, not from HEAD (and indeed, the logs you linked say tdigest_1.3.0-1~13.gitacfd74a.pgdg+1
not 1.4.0-dev
. Should I do a release, or can you test a particular commit?
Hi Tomas,
"cache mode binary" - that was backpatched, but not released yet, we are still on 14.1.
9.6 - I had not told apt.postgresql.org yet that 9.6 is EOL since we have not had new releases yet that did not include 9.6. I've done so now, so 9.x is history. \o/
https://github.com/tvondra/tdigest/issues/20#issuecomment-1007307796 was still on 1.3.0, not on head as my next posting after that. Sorry for the confusion.
I just tested HEAD (c08d21cc95), and the regression tests are now passing on amd64/i386/arm64/ppc64el, including 9.6.
A release with the fixes would be appreciated. Thanks!
Ah, right - I haven't realized the "cache mode" fix was not released yet. Anyway, I've disabled "memoize" node entirely, so that's not an issue anymore. If 9.6 is now EOL, I could delete the _3.out files, but I'll keep them for now.
Thanks for the info the tests now pass on the other architectures too, I'll do a bit more testing and create a proper release.
The tests are now passing on all archs on apt.pg.o with all PG versions. Thanks!
tdigest v1.2.0 is failing the regression tests with PG13 on architectures such as arm64, ppc64el, i386, and others: https://buildd.debian.org/status/logs.php?pkg=tdigest&ver=1.2.0-1
It's also failing on amd64 (aka x86_64) for PG11 and earlier: https://pgdgbuild.dus.dg-i.net/job/tdigest-binaries/12/architecture=amd64,distribution=sid/console
In both cases, it's a single offender that is fixed by this patch:
https://salsa.debian.org/postgresql/tdigest/-/blob/master/debian/patches/float-precision
I'd have submitted this as a pull request, but head's new features throw a lot more problems on i386 (32-bit Intel):
(The quoted patch still fixes the same problem here.)