pllua / pllua-deprecated

[DEPRECATED] This repository is no longer maintained. Please follow https://github.com/pllua/pllua
197 stars 16 forks source link

ERROR: invalid memory alloc request size 18446744073709551613 #60

Open bjne opened 6 years ago

bjne commented 6 years ago

When using prepared statements in _U, I experience weird behavior. I am not able to create a testcase that reproduces, but what I am seeing is:

Any idea what this is or where I can look to start debugging this? Running on pllua/pllua master and postgresql 9.5

eugwne commented 6 years ago

postgres compiled with --enable-debug ?

bjne commented 6 years ago

No, but I can try that?

eugwne commented 6 years ago

Yes, building postgres with debugging support might help

golgote commented 6 years ago

I had the same problem with Lua 5.1 installed with apt on Debian 9 and Postgres 10.

So I tried with luajit and pllua manually compiled and I didn't have this problem anymore, on the same platform.

Le 19 nov. 2017 15:25, "Eugene Sergeev" notifications@github.com a écrit :

Yes, building postgres with debugging support might help

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pllua/pllua/issues/60#issuecomment-345520593, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAwSKVoTEA41urTZ37g2Y3pdeWvrV-Fks5s4DpZgaJpZM4QjYzH .

bjne commented 6 years ago

Yeah, this is luajit. I tried with lua, and it gives the same problem, but at a slightly different time. But still, when adding 6 lines of commented out code at the top of the function causes a big difference, something is wrong here.

bjne commented 6 years ago

What I do know is that the rows() iterator accesses the first perhaps 20 rows just fine, but suddenly I get a row that when I access one of its columns (a text column), it throws the error, so this might have something to do with __tostring (but why the difference between rows vs cursor)? Let me try to pin it down some more..

bjne commented 6 years ago

So it seems to me that only accessing the text columns of the given row gives the error. I tried accessing booleans and integers in the same row by printing them, and that is just fine, but printing the text column causes: invalid memory alloc request size 18446744073709551613

bjne commented 6 years ago
2017-11-19 19:06:34.192 GMT [1858] DEBUG:  reaping dead processes
2017-11-19 19:06:34.192 GMT [1858] DEBUG:  server process (PID 1948) was terminated by signal 11: Segmentation fault
2017-11-19 19:06:34.192 GMT [1858] DETAIL:  Failed process was running: select module_schema('foo', 'bar')
2017-11-19 19:06:34.192 GMT [1858] LOG:  server process (PID 1948) was terminated by signal 11: Segmentation fault
2017-11-19 19:06:34.192 GMT [1858] DETAIL:  Failed process was running: select module_schema('foo', 'bar')
2017-11-19 19:06:34.192 GMT [1858] LOG:  terminating any other active server processes
2017-11-19 19:06:34.192 GMT [1858] DEBUG:  sending SIGQUIT to process 1865
2017-11-19 19:06:34.192 GMT [1858] DEBUG:  sending SIGQUIT to process 1861
2017-11-19 19:06:34.192 GMT [1858] DEBUG:  sending SIGQUIT to process 1860
2017-11-19 19:06:34.192 GMT [1858] DEBUG:  sending SIGQUIT to process 1862
2017-11-19 19:06:34.192 GMT [1858] DEBUG:  sending SIGQUIT to process 1863
2017-11-19 19:06:34.192 GMT [1858] DEBUG:  sending SIGQUIT to process 1864
2017-11-19 19:06:34.192 GMT [1864] DEBUG:  writing stats file "pg_stat/global.stat"
2017-11-19 19:06:34.192 GMT [1861] DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
2017-11-19 19:06:34.192 GMT [1861] DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
2017-11-19 19:06:34.192 GMT [1861] DEBUG:  proc_exit(-1): 0 callbacks to make
2017-11-19 19:06:34.192 GMT [1864] DEBUG:  writing stats file "pg_stat/db_12328.stat"
2017-11-19 19:06:34.192 GMT [1864] DEBUG:  removing temporary stats file "pg_stat_tmp/db_12328.stat"
2017-11-19 19:06:34.192 GMT [1864] DEBUG:  writing stats file "pg_stat/db_17203.stat"
2017-11-19 19:06:34.193 GMT [1864] DEBUG:  removing temporary stats file "pg_stat_tmp/db_17203.stat"
2017-11-19 19:06:34.193 GMT [1864] DEBUG:  writing stats file "pg_stat/db_0.stat"
2017-11-19 19:06:34.193 GMT [1864] DEBUG:  removing temporary stats file "pg_stat_tmp/db_0.stat"
2017-11-19 19:06:34.193 GMT [1863] WARNING:  terminating connection because of crash of another server process
2017-11-19 19:06:34.193 GMT [1863] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2017-11-19 19:06:34.193 GMT [1863] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2017-11-19 19:06:34.193 GMT [1863] DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
2017-11-19 19:06:34.193 GMT [1863] DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
2017-11-19 19:06:34.193 GMT [1863] DEBUG:  proc_exit(-1): 0 callbacks to make
2017-11-19 19:06:34.193 GMT [1864] DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
2017-11-19 19:06:34.193 GMT [1864] DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
2017-11-19 19:06:34.193 GMT [1864] DEBUG:  proc_exit(-1): 0 callbacks to make
2017-11-19 19:06:34.193 GMT [1858] DEBUG:  reaping dead processes
2017-11-19 19:06:34.193 GMT [1862] DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
2017-11-19 19:06:34.193 GMT [1862] DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
2017-11-19 19:06:34.193 GMT [1862] DEBUG:  proc_exit(-1): 0 callbacks to make
2017-11-19 19:06:34.193 GMT [1858] DEBUG:  reaping dead processes
2017-11-19 19:06:34.193 GMT [1865] DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
2017-11-19 19:06:34.194 GMT [1865] DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
2017-11-19 19:06:34.194 GMT [1865] DEBUG:  proc_exit(-1): 0 callbacks to make
2017-11-19 19:06:34.193 GMT [1860] DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
2017-11-19 19:06:34.194 GMT [1860] DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
2017-11-19 19:06:34.194 GMT [1860] DEBUG:  proc_exit(-1): 0 callbacks to make
2017-11-19 19:06:34.194 GMT [1858] DEBUG:  reaping dead processes
2017-11-19 19:06:34.194 GMT [1858] DEBUG:  reaping dead processes
2017-11-19 19:06:34.194 GMT [1858] LOG:  all server processes terminated; reinitializing
2017-11-19 19:06:34.194 GMT [1858] DEBUG:  shmem_exit(1): 0 before_shmem_exit callbacks to make
2017-11-19 19:06:34.194 GMT [1858] DEBUG:  shmem_exit(1): 5 on_shmem_exit callbacks to make
2017-11-19 19:06:34.194 GMT [1858] DEBUG:  cleaning up dynamic shared memory control segment with ID 877356333
2017-11-19 19:06:34.196 GMT [1858] DEBUG:  invoking IpcMemoryCreate(size=15237120)
2017-11-19 19:06:34.196 GMT [1858] DEBUG:  mmap(16777216) with MAP_HUGETLB failed, huge pages disabled: Cannot allocate memory
2017-11-19 19:06:34.201 GMT [1858] DEBUG:  SlruScanDirectory invoking callback on pg_notify/0000
2017-11-19 19:06:34.201 GMT [1858] DEBUG:  removing file "pg_notify/0000"
bjne commented 6 years ago
(gdb) b mcxt.c:858
Breakpoint 1 at 0x82dfb9: file mcxt.c, line 858.
(gdb) continue
Continuing.

Breakpoint 1, palloc (size=18446744073709551613) at mcxt.c:858
warning: Source file is more recent than executable.
858                     elog(ERROR, "palloc invalid memory alloc request size %zu", size);
(gdb) bt
#0  palloc (size=18446744073709551613) at mcxt.c:858
#1  0x00000000007e8e76 in text_to_cstring (t=0x1aacad0) at varlena.c:189
#2  0x000000000080d7ed in DirectFunctionCall1Coll (func=0x7e95a0 <textout>, collation=collation@entry=0, arg1=arg1@entry=27970256) at fmgr.c:717
#3  0x00007fc300e65c5b in luaP_pushdatum (L=L@entry=0x19876b0, dat=27970256, type=25) at plluaapi.c:891
#4  0x00007fc300e68cf5 in luaP_p_tupleindex (L=0x19876b0) at plluaspi.c:471
#5  0x00007fc300c12320 in ?? () from /usr/lib/x86_64-linux-gnu/liblua5.1.so.0
#6  0x00007fc300c12734 in ?? () from /usr/lib/x86_64-linux-gnu/liblua5.1.so.0
#7  0x00007fc300c1b5d6 in ?? () from /usr/lib/x86_64-linux-gnu/liblua5.1.so.0
#8  0x00007fc300c1c94e in ?? () from /usr/lib/x86_64-linux-gnu/liblua5.1.so.0
#9  0x00007fc300c1277d in ?? () from /usr/lib/x86_64-linux-gnu/liblua5.1.so.0
#10 0x00007fc300c11a5e in ?? () from /usr/lib/x86_64-linux-gnu/liblua5.1.so.0
#11 0x00007fc300c128eb in ?? () from /usr/lib/x86_64-linux-gnu/liblua5.1.so.0
#12 0x00007fc300c0e3a8 in lua_pcall () from /usr/lib/x86_64-linux-gnu/liblua5.1.so.0
#13 0x00007fc300e67a45 in luaP_callhandler (L=0x19876b0, fcinfo=0x7fff0e1fcac0) at plluaapi.c:1396
#14 0x00000000005ef599 in ExecMakeTableFunctionResult (setexpr=0x197e5d8, econtext=0x197e318, argContext=<optimized out>, expectedDesc=0x197f798, randomAccess=0 '\000') at execSRF.c:231
#15 0x00000000005fae13 in FunctionNext (node=node@entry=0x197e208) at nodeFunctionscan.c:94
#16 0x00000000005eea79 in ExecScanFetch (recheckMtd=0x5fab40 <FunctionRecheck>, accessMtd=0x5fab60 <FunctionNext>, node=0x197e208) at execScan.c:97
#17 ExecScan (node=0x197e208, accessMtd=0x5fab60 <FunctionNext>, recheckMtd=0x5fab40 <FunctionRecheck>) at execScan.c:147
#18 0x00000000005e8f7a in ExecProcNode (node=0x197e208) at ../../../src/include/executor/executor.h:250
#19 ExecutePlan (execute_once=<optimized out>, dest=0x19de918, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x197e208, estate=0x197dff8)
    at execMain.c:1722
#20 standard_ExecutorRun (queryDesc=0x1985218, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:363
#21 0x000000000070e67c in PortalRunSelect (portal=portal@entry=0x1a16618, forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, dest=dest@entry=0x19de918) at pquery.c:932
#22 0x000000000070fa5e in PortalRun (portal=portal@entry=0x1a16618, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', run_once=run_once@entry=1 '\001', dest=dest@entry=0x19de918,
    altdest=altdest@entry=0x19de918, completionTag=0x7fff0e1fd1b0 "") at pquery.c:773
#23 0x000000000070b929 in exec_simple_query (query_string=0x19e2768 "select * from module_schema('foo','bar');") at postgres.c:1099
#24 0x000000000070cdbd in PostgresMain (argc=<optimized out>, argv=argv@entry=0x198f060, dbname=0x198c208 "webcfg", username=<optimized out>) at postgres.c:4088
#25 0x0000000000477d30 in BackendRun (port=0x1981f90) at postmaster.c:4357
#26 BackendStartup (port=0x1981f90) at postmaster.c:4029
#27 ServerLoop () at postmaster.c:1753
#28 0x00000000006a375c in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x195e9f0) at postmaster.c:1361
#29 0x0000000000478cc1 in main (argc=3, argv=0x195e9f0) at main.c:228
bjne commented 6 years ago

This seems to reproduce the bug I am seeing

create database pllua_bug;
\c pllua_bug

create extension if not exists pllua;

drop table test;
create table test as select s, md5(random()::text) from generate_Series(1,500) s;

create or replace function f1()
    returns void as $$
        for t in _U.get_test:rows() do
                print(t.md5)
        end
end
do
    _U = {
        get_test = server.prepare([[
            SELECT *
              FROM test
        ]]):save()
    }
$$ language plluau;

create or replace function f2()
    returns void as $$
        for t in _U.get_test:rows() do
                print(t.md5)
        end
end
do
    _U = {
        get_test = server.prepare([[
            SELECT *
              FROM test
        ]]):save()
    }
$$ language plluau;

select * from f1();
select * from f2();
...
INFO:  deaba1fc7c1890103fa3734a7a21cb2c
INFO:  effa1c3f13e@
INFO:  3ed5d52d6061147693064061091@
INFO:  12b83526047
ERROR:  palloc invalid memory alloc request size 1647392101
bjne commented 6 years ago

Above test run on postgres compiled with --enable-debug --enable-cassert

Thread 1 "postgres" hit Breakpoint 1, palloc (size=18446744073709551613) at mcxt.c:858
858                     elog(ERROR, "palloc invalid memory alloc request size %zu", size);
(gdb) bt
#0  palloc (size=18446744073709551613) at mcxt.c:858
#1  0x0000000000840806 in text_to_cstring (t=0xfc015c) at varlena.c:189
#2  0x0000000000867e3d in DirectFunctionCall1Coll (func=0x841010 <textout>, collation=collation@entry=0, arg1=arg1@entry=16515420) at fmgr.c:717
#3  0x00007f0902538e6b in luaP_pushdatum (L=L@entry=0x40f83378, dat=16515420, type=25) at plluaapi.c:880
#4  0x00007f090253bf35 in luaP_p_tupleindex (L=0x40f83378) at plluaspi.c:471
#5  0x00007f0902296a77 in ?? () from /usr/local/lib/libluajit-5.1.so.2
#6  0x00007f09022a700d in lua_pcall () from /usr/local/lib/libluajit-5.1.so.2
#7  0x00007f090253abe5 in luaP_callhandler (L=0x40f83378, fcinfo=0x7ffea9cb6cd0) at plluaapi.c:1385
#8  0x0000000000622579 in ExecMakeTableFunctionResult (setexpr=0xf73068, econtext=0xf72d60, argContext=<optimized out>, expectedDesc=0xf73880, randomAccess=0 '\000') at execSRF.c:231
#9  0x000000000062f143 in FunctionNext (node=node@entry=0xf72c48) at nodeFunctionscan.c:94
#10 0x0000000000621869 in ExecScanFetch (recheckMtd=0x62ee80 <FunctionRecheck>, accessMtd=0x62ee90 <FunctionNext>, node=0xf72c48) at execScan.c:97
#11 ExecScan (node=0xf72c48, accessMtd=0x62ee90 <FunctionNext>, recheckMtd=0x62ee80 <FunctionRecheck>) at execScan.c:147
#12 0x000000000061b5ea in ExecProcNode (node=0xf72c48) at ../../../src/include/executor/executor.h:250
#13 ExecutePlan (execute_once=<optimized out>, dest=0xf93668, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0xf72c48, estate=0xf72a30)
    at execMain.c:1722
#14 standard_ExecutorRun (queryDesc=0xecfac0, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:363
#15 0x000000000075caac in PortalRunSelect (portal=portal@entry=0xf6ea10, forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, dest=dest@entry=0xf93668) at pquery.c:932
#16 0x000000000075e066 in PortalRun (portal=portal@entry=0xf6ea10, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', run_once=run_once@entry=1 '\001', dest=dest@entry=0xf93668, altdest=altdest@entry=0xf93668,
    completionTag=0x7ffea9cb73c0 "") at pquery.c:773
#17 0x0000000000759b99 in exec_simple_query (query_string=0xf2af10 "select * from f2();") at postgres.c:1099
#18 0x000000000075afde in PostgresMain (argc=<optimized out>, argv=argv@entry=0xed7f78, dbname=0xed7df8 "pllua_bug", username=<optimized out>) at postgres.c:4088
#19 0x0000000000478819 in BackendRun (port=0xeccfa0) at postmaster.c:4357
#20 BackendStartup (port=0xeccfa0) at postmaster.c:4029
#21 ServerLoop () at postmaster.c:1753
#22 0x00000000006e4025 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0xea99f0) at postmaster.c:1361
#23 0x0000000000479c41 in main (argc=3, argv=0xea99f0) at main.c:228
(gdb) up 4
#4  0x00007f090253bf35 in luaP_p_tupleindex (L=0x40f83378) at plluaspi.c:471
471                     luaP_pushdatum(L, t->value[i], tupleDesc->attrs[i]->atttypid);
(gdb) print t
$1 = (luaP_Tuple *) 0xfa5460
(gdb) print t->value
$2 = (Datum *) 0xfa5490
(gdb) print *t
$3 = {changed = -1, relid = 0, tuple = 0xfc0128, tupdesc = 0x0, value = 0xfa5490, null = 0xfa54a0 "", rtupdesc = 0xf96998}
(gdb) print t->value[0]
$4 = 296
(gdb) print t->value[1]
$5 = 16515420
(gdb) print *(t->tuple)
$6 = {t_len = 912680504, t_self = {ip_blkid = {bi_hi = 12646, bi_lo = 13154}, ip_posid = 14643}, t_tableOid = 929391921, t_data = 0xfc7e00}
(gdb)
RhodiumToad commented 6 years ago

I discussed this in the #postgresql irc channel with bjne. It looks to me like much of pllua's memory management is wrong - this particular bug seems to be the result of shallow-copying Datum values without care for whether the target of a by-reference value is being freed. In particular you can't safely SPI_freetuptable a result set until you no longer have any Datum values that were fetched from the tuples and not copied.

bjne commented 6 years ago

...and for the record, this was the start of rhodiumtoad/pllua-ng

anthony-hz commented 6 years ago

Many thanks for Pllua-ng

ghost commented 4 years ago

Hello everyone! This strange and terrifying number showed up on my trainee MySQL too, but in different place. mysqli_get_client_stats(); activeconnections: 18446744073709551613 ... he he (in my language it is sardonic laugh). And after every new mysqli - even mysqli_close($con) it is getting smaller... Sometimes it appears properly as 0 or 1, for one moment and again is going vicious, nothing helps. Database is working, data's aren't go freak. I did small test of maths - I put some small values unsigned, negatives etc - everything is good, numbers keep values and types. Right now, I decided to ignore this - this db is only for exercises and playing, no relevant data - and I have no idea what from it comes... Perhaps someone smarter than me, will someday discover what devil speeds this number, I just leave this massage for investigators. MySQL 5.7, utf-8, utf9mb4, general or unicode; well, Swedish collation and latin1 on server (default); This is good working hosting since years, recently nothing changed, what else can I say? I did nothing weird to my db. Whole database it is 5-7 tables, each 0 - 20 rows, I am the only user of this db; Wow, active_connections: 1, and right after (2 queries) 18446744073709551615, after few minutes: 0, than after 2 queries 18446744073709551614 active_connections... So, I will ignore this, what else can I do?

RhodiumToad commented 4 years ago

2^64 is 18446744073709551616, so 2^64 - 1 is 18446744073709551615, etc.

Therefore these numbers come up very frequently as a result of underflowing a 64-bit unsigned counter.

ghost commented 4 years ago

Thank you, but what could that mean in case mentioned? The number is somehow understandable for x64 server, but still - mysqli_get_client_stats() [active_connections] is kind of "check the list and tell me how many" it should be 0 or 1 (int) Where does this ... freak come from and why? Is there a cure for it? Of course, the most welcome answer would be (working) function like this: function spell(){ wild x64 counters underflows GO HOME!} ....and they lived happily ever after.

Meanwhile - it still bothers me - is it somehow dangerous for server or for database? Should I keep digging, testing and asking or just getting used to it?

RhodiumToad commented 4 years ago

This is not the place to ask these questions. Take it somewhere more appropriate.