tvondra / tdigest

PostgreSQL extension for estimating percentiles using t-digest
PostgreSQL License
88 stars 13 forks source link

Assertion failure for `tdigest_percentile` #19

Closed agedemenli closed 3 years ago

agedemenli commented 3 years ago

I was testing tdigest with Citus extension to prepare Citus for PG14 release. I hit an assertion fail and noticed that it isn't related to Citus. This assertion might have problems when this is run on a production environment. This should either

Postgres version: 14beta3 -- (then tried with 13.0, it fails as well) The failure is happening in tdigest_add_double_count Exaclty here: https://github.com/tvondra/tdigest/blob/d14642eb573ddd80623b1dc8392587cf3ee2bb43/tdigest.c#L1172

Example:

create extension tdigest;
CREATE TABLE t (a int, b int, c double precision);
INSERT INTO t values(0, 0, 0.0);
SELECT tdigest_percentile(t.c, t.a, 100, 0.95) FROM t;

Here is the backtrace:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007fab130e6921 in __GI_abort () at abort.c:79
#2  0x000055a03540314f in ExceptionalCondition (conditionName=conditionName@entry=0x7fab076b7c5b "count > 0",
    errorType=errorType@entry=0x7fab076b7c07 "FailedAssertion", fileName=fileName@entry=0x7fab076b7bfd "tdigest.c",
    lineNumber=lineNumber@entry=1172) at assert.c:69
#3  0x00007fab076b57d0 in tdigest_add_double_count (fcinfo=0x55a036eec708) at tdigest.c:1172
#4  0x000055a0350eae97 in ExecAggPlainTransByVal (setno=<optimized out>, aggcontext=<optimized out>,
    pergroup=0x55a036ee04d0, pertrans=<optimized out>, aggstate=<optimized out>) at execExprInterp.c:4315
#5  ExecInterpExpr (state=0x55a036eec7f8, econtext=0x55a036eded80, isnull=0x7fffceca85d7) at execExprInterp.c:1713
#6  0x000055a0350e608e in ExecInterpExprStillValid (state=0x55a036eec7f8, econtext=0x55a036eded80,
    isNull=0x7fffceca85d7) at execExprInterp.c:1824
#7  0x000055a035103287 in ExecEvalExprSwitchContext (isNull=0x7fffceca85d7, econtext=<optimized out>,
    state=<optimized out>) at ../../../src/include/executor/executor.h:339
#8  advance_aggregates (aggstate=aggstate@entry=0x55a036ede948) at nodeAgg.c:842
#9  0x000055a0351074f3 in agg_retrieve_direct (aggstate=aggstate@entry=0x55a036ede948) at nodeAgg.c:2454
#10 0x000055a035107752 in ExecAgg (pstate=0x55a036ede948) at nodeAgg.c:2179
#11 0x000055a0350f7118 in ExecProcNodeFirst (node=0x55a036ede948) at execProcnode.c:463
#12 0x000055a0350eeffc in ExecProcNode (node=0x55a036ede948) at ../../../src/include/executor/executor.h:257
#13 ExecutePlan (estate=estate@entry=0x55a036ede710, planstate=0x55a036ede948, use_parallel_mode=<optimized out>,
    operation=operation@entry=CMD_SELECT, sendTuples=sendTuples@entry=true, numberTuples=numberTuples@entry=0,
    direction=ForwardScanDirection, dest=0x55a036ee1840, execute_once=true) at execMain.c:1551
#14 0x000055a0350efc97 in standard_ExecutorRun (queryDesc=0x55a036e3b670, direction=ForwardScanDirection, count=0,
    execute_once=<optimized out>) at execMain.c:361
#15 0x000055a0350efd62 in ExecutorRun (queryDesc=queryDesc@entry=0x55a036e3b670,
    direction=direction@entry=ForwardScanDirection, count=count@entry=0, execute_once=<optimized out>)
    at execMain.c:305
#16 0x000055a0352bb8b4 in PortalRunSelect (portal=portal@entry=0x55a036e80df0, forward=forward@entry=true, count=0,
    count@entry=9223372036854775807, dest=dest@entry=0x55a036ee1840) at pquery.c:919
---Type <return> to continue, or q <return> to quit---
#17 0x000055a0352bd304 in PortalRun (portal=portal@entry=0x55a036e80df0, count=count@entry=9223372036854775807,
    isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x55a036ee1840,
    altdest=altdest@entry=0x55a036ee1840, qc=0x7fffceca8990) at pquery.c:763
#18 0x000055a0352b91d2 in exec_simple_query (
    query_string=query_string@entry=0x55a036e1a0c0 "SELECT tdigest_percentile(t.c, t.a, 100, 0.95) FROM t;")
    at postgres.c:1214
#19 0x000055a0352bb2fe in PostgresMain (argc=argc@entry=1, argv=argv@entry=0x7fffceca8b90, dbname=<optimized out>,
    username=<optimized out>) at postgres.c:4486
#20 0x000055a035213a94 in BackendRun (port=port@entry=0x55a036e3e790) at postmaster.c:4506
#21 0x000055a035216d25 in BackendStartup (port=port@entry=0x55a036e3e790) at postmaster.c:4228
#22 0x000055a035216f6a in ServerLoop () at postmaster.c:1745
#23 0x000055a035218515 in PostmasterMain (argc=3, argv=<optimized out>) at postmaster.c:1417
#24 0x000055a035153e02 in main (argc=3, argv=0x55a036e11be0) at main.c:209
tvondra commented 3 years ago

Thanks for the report. Well, the assert is triggered because the table contains rows like this:

 a | b |         c          
---+---+--------------------
 0 | 4 | 0.5139344756720483
 0 | 5 | 0.9749701220118325
 0 | 4 | 0.2685794689591283
(3 rows)

And we simply don't allow adding values with 0 (or negative) counts - because what would that do?

But it certainly seems a bit bogus to handle that using an assert - it's a user-supplier value, not some internal parameter, so it needs to be checked even on production builds without assert. I'll change that to an elog(ERROR).

An alternative would be to just ignore those values, but that seems rather dangerous and might easily mask issues.

tvondra commented 3 years ago

Should be fixed by a75271b943561418bb4ce9c06008601c56cb21e7 - instead of crashing it should report ERROR.

agedemenli commented 3 years ago

Thank you!