Closed df7cb closed 2 years ago
It's also hitting amd64 with PG12 on Ubuntu Jammy: https://pgdgbuild.dus.dg-i.net/view/Snapshot/job/pgpointcloud-binaries-snapshot/lastFailedBuild/architecture=amd64,distribution=jammy/console
Hello @df7cb,
First thank you for reporting this issue.
By taking a look with Valgrind on a pc_explode
request, I noticed that fishy message:
==154307== 4 errors in context 3 of 4:
==154307== Conditional jump or move depends on uninitialised value(s)
==154307== at 0x123D6708: pc_schema_from_pcid (pc_pgsql.c:389)
==154307== by 0x123D4C63: pcpatch_unnest (pc_access.c:584)
==154307== by 0x3C84E7: ExecMakeFunctionResultSet (execSRF.c:614)
==154307== by 0x3EA717: ExecProjectSRF (nodeProjectSet.c:175)
==154307== by 0x3EA804: ExecProjectSet (nodeProjectSet.c:105)
==154307== by 0x3E42D8: UnknownInlinedFun (executor.h:248)
==154307== by 0x3E42D8: ExecLimit (nodeLimit.c:96)
==154307== by 0x3BF8D3: ExecProcNode (executor.h:248)
==154307== by 0x3BF8D3: ExecutePlan (execMain.c:1632)
==154307== by 0x3BF8D3: standard_ExecutorRun (execMain.c:350)
==154307== by 0x5271BA: PortalRunSelect (pquery.c:921)
==154307== by 0x5285CF: PortalRun (pquery.c:765)
==154307== by 0x524146: exec_simple_query (postgres.c:1238)
==154307== by 0x525DE9: PostgresMain (postgres.c:4347)
==154307== by 0x234E5E: main (main.c:206)
==154307== Uninitialised value was created by a heap allocation
==154307== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==154307== by 0x65A78A: AllocSetContextCreateInternal (aset.c:468)
==154307== by 0x640BE6: init_MultiFuncCall (funcapi.c:85)
==154307== by 0x123D4C34: pcpatch_unnest (pc_access.c:568)
==154307== by 0x3C84E7: ExecMakeFunctionResultSet (execSRF.c:614)
==154307== by 0x3EA717: ExecProjectSRF (nodeProjectSet.c:175)
==154307== by 0x3EA804: ExecProjectSet (nodeProjectSet.c:105)
==154307== by 0x3E42D8: UnknownInlinedFun (executor.h:248)
==154307== by 0x3E42D8: ExecLimit (nodeLimit.c:96)
==154307== by 0x3BF8D3: ExecProcNode (executor.h:248)
==154307== by 0x3BF8D3: ExecutePlan (execMain.c:1632)
==154307== by 0x3BF8D3: standard_ExecutorRun (execMain.c:350)
==154307== by 0x5271BA: PortalRunSelect (pquery.c:921)
==154307== by 0x5285CF: PortalRun (pquery.c:765)
==154307== by 0x524146: exec_simple_query (postgres.c:1238)
Then, when I run that specific request a lot, this leads to a crash sometimes.
Actually, it's due to the fact that we check if fcinfo->flinfo->fn_extra
is NULL
on the first call (after SRF_IS_FIRSTCALL()
) for intializing the internal cache of schemas. However, it's not true due to SRF_FIRSTCALL_INIT()
so it leads to some unstable behavior because the cache is not always initialised (thus the Conditional jump or move depends on uninitialised value(s)
message returned by Valgrind).
I'm currently working on a PR for fixing that.
Glad to see progress here, thanks!
Hello @df7cb,
I merged some bugfixes and made a new 1.2.3 release. Let me know if you still have issues.
Thanks Paul!
The architectures are fine now, modulo some test differences on big-endian machines that I muted in the Debian package, but that isn't a problem.
The architectures are fine now
Good news :tada:
modulo some test differences on big-endian machines
Yes I didn't forget your previous ticket https://github.com/pgpointcloud/pointcloud/issues/278. I'm going to take a look asap.
Thanks a lot for your work @df7cb :+1:
Seen on Debian on ppc64el with pointcloud 1.2.2:
Build log: https://pgdgbuild.dus.dg-i.net/job/pgpointcloud-binaries/27/architecture=ppc64el,distribution=sid/console Build status: https://pgdgbuild.dus.dg-i.net/job/pgpointcloud-binaries/