pgsql-io / multicorn2

http://multicorn2.org
Other
73 stars 16 forks source link

SEGFAULT with delete on PG14 #15

Closed ShaheedHaque closed 2 months ago

ShaheedHaque commented 2 years ago

I have a Multicorn FDW table and some test code which does a delete() on a row in it which worked on PG12 and PG13, but is SEGFAULTing using stock PG14 on Ubuntu 22. The FDW table has columns like this:

    columns = {
        'id': 'char(36) NOT NULL',
        'queue': 'varchar(40) NOT NULL',
        'args': 'JSONB NOT NULL',
        'kwargs': 'JSONB NOT NULL',
        'type': 'varchar(80) NOT NULL',
        'state': 'varchar(20) NOT NULL',
        'timestamp': 'timestamp NOT NULL',
        'worker': 'varchar(20)',
        'details': 'JSONB',
    }

and a typical row looks like this:

import json
from dateutil.parser import isoparse

row = {
    'id': 'e7414f38-43b3-49e6-bf8b-ea94e76e853f',
    'queue': string_queue_name,
    'args': json.dumps(...),
    'kwargs': json.dumps(...),
    'type': 'paiyroll.tasks.test_timer',
    'state': 'queued',
    'timestamp': isoparse('2018-09-07T21:51:32.385891+00:00'),
    'worker': None,
    'details': None,
}

The delete looks like this DELETE FROM "jobs" WHERE "jobs"."id" IN ('5aa940d6-f16a-49f1-9bed-26420798adff') and the SEGFAULT log - augmented with some prints - looks like this:

2022-06-15 11:25:14.818 BST [980965] dbcoreuser@foo WARNING:  ==== called 100  value=0
2022-06-15 11:25:14.818 BST [980965] dbcoreuser@foo WARNING:  ==== called 100  cinfo=0x55fc7470ecb8
2022-06-15 11:25:14.818 BST [980965] dbcoreuser@foo WARNING:  ==== called 100  cinfo->atttypoid=1042
2022-06-15 11:25:14.818 BST [980965] dbcoreuser@foo WARNING:  datumUnknownToPython datum 0
2022-06-15 11:25:14.818 BST [980965] dbcoreuser@foo WARNING:  datumUnknownToPython cinfo 0x55fc7470ecb8
2022-06-15 11:25:14.818 BST [980965] dbcoreuser@foo WARNING:  datumUnknownToPython type 1042
2022-06-15 11:25:14.818 BST [980965] dbcoreuser@foo WARNING:  outfuncoid 1045
2022-06-15 11:25:14.818 BST [980965] dbcoreuser@foo WARNING:  isvarlena 1
2022-06-15 11:25:14.818 BST [980965] dbcoreuser@foo WARNING:  fmout 0x55fc7470f428
2022-06-15 11:25:14.872 BST [980327] LOG:  server process (PID 980965) was terminated by signal 11: Segmentation fault
2022-06-15 11:25:14.872 BST [980327] DETAIL:  Failed process was running: DELETE FROM "jobs" WHERE "jobs"."id" IN ('64dc86c2-16e6-4a53-82f1-89fd3ed5d7de')

The stack trace looks like this:

== multicorn.c ==
multicornExecForeignDelete(EState *estate, ResultRelInfo *resultRelInfo,...
     ConversionInfo *cinfo = modstate->rowidCinfo;
     Datum        value = ExecGetJunkAttribute(planSlot, modstate->rowidAttno, &is_null);

     elog(WARNING, "==== called 100  value=%lu", value);
     elog(WARNING, "==== called 100  cinfo=%p", cinfo);
     elog(WARNING, "==== called 100  cinfo->atttypoid=%u", cinfo->atttypoid);
     p_row_id = datumToPython(value, cinfo->atttypoid, cinfo); <<<<<<<<<<<<<<< SEGFAULT HERE
==  python.c ===============
datumToPython()
==  python.c =======
PyObject *
datumUnknownToPython(Datum datum, ConversionInfo * cinfo, Oid type)
{
       char       *temp;
       ssize_t         size;
       PyObject   *result;
       Oid                     outfuncoid;
       bool            isvarlena;
       FmgrInfo   *fmout = palloc0(sizeof(FmgrInfo));

       elog(WARNING, "datumUnknownToPython datum %lu", datum);
       elog(WARNING, "datumUnknownToPython cinfo %p", cinfo);
       elog(WARNING, "datumUnknownToPython type %u", type);
       getTypeOutputInfo(type, &outfuncoid, &isvarlena);
       elog(WARNING, "outfuncoid %u", outfuncoid);
       elog(WARNING, "isvarlena %u", isvarlena);
       fmgr_info(outfuncoid, fmout);
       elog(WARNING, "fmout %p", fmout);
       temp = OutputFunctionCall(fmout, datum); <<<<<<<<<<<<<<<<<<<<<<<<< SEGFAULT HERE
       elog(WARNING, "temp %s", temp);
=========

The "value = 0" seems suspicious because none of the other calls to this function have a zero there.

luss commented 2 years ago

Thanks for the info to reproduce Shaeed. Seems this likely has something to do with the code I have changed to make Multicorn2 work with pg14. We'll figure it out, and we'll know now to carefully test that deleting and pushing down predicates works when we port to support pg15.

--Denis

On Wed, Jun 15, 2022 at 8:50 AM ShaheedHaque @.***> wrote:

I have a Multicorn FDW table and some test code which does a delete() on a row in it which worked on PG12 and PG13, but is SEGFAULTing using stock PG14 on Ubuntu 22. The FDW table has columns like this:

columns = {
    'id': 'char(36) NOT NULL',
    'queue': 'varchar(40) NOT NULL',
    'args': 'JSONB NOT NULL',
    'kwargs': 'JSONB NOT NULL',
    'type': 'varchar(80) NOT NULL',
    'state': 'varchar(20) NOT NULL',
    'timestamp': 'timestamp NOT NULL',
    'worker': 'varchar(20)',
    'details': 'JSONB',
}

and a typical row looks like this:

import json from dateutil.parser import isoparse

row = { 'id': 'e7414f38-43b3-49e6-bf8b-ea94e76e853f', 'queue': string_queue_name, 'args': json.dumps(...), 'kwargs': json.dumps(...), 'type': 'paiyroll.tasks.test_timer', 'state': 'queued', 'timestamp': isoparse('2018-09-07T21:51:32.385891+00:00'), 'worker': None, 'details': None, }

The delete looks like this DELETE FROM "jobs" WHERE "jobs"."id" IN ('5aa940d6-f16a-49f1-9bed-26420798adff') and the SEGFAULT log - augmented with some prints - looks like this:

2022-06-15 11:25:14.818 BST [980965] @. WARNING: ==== called 100 value=0 2022-06-15 11:25:14.818 BST [980965] @. WARNING: ==== called 100 cinfo=0x55fc7470ecb8 2022-06-15 11:25:14.818 BST [980965] @. WARNING: ==== called 100 cinfo->atttypoid=1042 2022-06-15 11:25:14.818 BST [980965] @. WARNING: datumUnknownToPython datum 0 2022-06-15 11:25:14.818 BST [980965] @. WARNING: datumUnknownToPython cinfo 0x55fc7470ecb8 2022-06-15 11:25:14.818 BST [980965] @. WARNING: datumUnknownToPython type 1042 2022-06-15 11:25:14.818 BST [980965] @. WARNING: outfuncoid 1045 2022-06-15 11:25:14.818 BST [980965] @. WARNING: isvarlena 1 2022-06-15 11:25:14.818 BST [980965] @.*** WARNING: fmout 0x55fc7470f428 2022-06-15 11:25:14.872 BST [980327] LOG: server process (PID 980965) was terminated by signal 11: Segmentation fault 2022-06-15 11:25:14.872 BST [980327] DETAIL: Failed process was running: DELETE FROM "jobs" WHERE "jobs"."id" IN ('64dc86c2-16e6-4a53-82f1-89fd3ed5d7de')

The stack trace looks like this:

== multicorn.c == multicornExecForeignDelete(EState estate, ResultRelInfo resultRelInfo,... ConversionInfo *cinfo = modstate->rowidCinfo; Datum value = ExecGetJunkAttribute(planSlot, modstate->rowidAttno, &is_null);

 elog(WARNING, "==== called 100  value=%lu", value);
 elog(WARNING, "==== called 100  cinfo=%p", cinfo);
 elog(WARNING, "==== called 100  cinfo->atttypoid=%u", cinfo->atttypoid);
 p_row_id = datumToPython(value, cinfo->atttypoid, cinfo); <<<<<<<<<<<<<<< SEGFAULT HERE

== python.c =============== datumToPython() == python.c ======= PyObject datumUnknownToPython(Datum datum, ConversionInfo cinfo, Oid type) { char temp; ssize_t size; PyObject result; Oid outfuncoid; bool isvarlena; FmgrInfo *fmout = palloc0(sizeof(FmgrInfo));

   elog(WARNING, "datumUnknownToPython datum %lu", datum);
   elog(WARNING, "datumUnknownToPython cinfo %p", cinfo);
   elog(WARNING, "datumUnknownToPython type %u", type);
   getTypeOutputInfo(type, &outfuncoid, &isvarlena);
   elog(WARNING, "outfuncoid %u", outfuncoid);
   elog(WARNING, "isvarlena %u", isvarlena);
   fmgr_info(outfuncoid, fmout);
   elog(WARNING, "fmout %p", fmout);
   temp = OutputFunctionCall(fmout, datum); <<<<<<<<<<<<<<<<<<<<<<<<< SEGFAULT HERE
   elog(WARNING, "temp %s", temp);

=========

The "value = 0" seems suspicious because none of the other calls to this function have a zero there.

— Reply to this email directly, view it on GitHub https://github.com/pgsql-io/multicorn2/issues/15, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMWOHW57ZHPMBMMBJD4OT3VPHGSJANCNFSM5Y3GBGGQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

luss commented 2 years ago

I added a blurb to the projects main README saying pg14 support is still experimental.

ShaheedHaque commented 1 year ago

@luss Any thoughts on a timeline for this?

ShaheedHaque commented 1 year ago

I note that if #16 is applied, the proximate cause of the crash, i.e. the SEGFAULT in datumUnknownToPython() is avoided. AFAIK however, the actual cause of the SEGFAULT is still present.

(Aside: the "delete" code in my FDW is being invoked, but somehow some data somewhere is being corrupted. I will see if I can get any more clues).