postgrespro / pgsphere

PgSphere provides spherical data types, functions, operators, and indexing for PostgreSQL.
https://pgsphere.org
BSD 3-Clause "New" or "Revised" License
16 stars 15 forks source link

Bus error on sparc64 in smoc code #109

Closed df7cb closed 7 months ago

df7cb commented 7 months ago

I'm only filing this for reference since I was curious and poked around a bit with it. I don't expect any fixes, just writing it down in case I get curious again in the future. :)

On Debian's unofficial sparc64 architecture, pgsphere is failing the moc regression tests:

2023-11-16 14:17:18.560 UTC [658036] LOG:  Serverprozess (PID 660738) wurde von Signal 10 beendet: Bus-Zugriffsfehler
2023-11-16 14:17:18.560 UTC [658036] DETAIL:  Der fehlgeschlagene Prozess führte aus: select '1/1'::smoc;

(gdb) bt
#0  order_break (outputs=std::vector of length 2, capacity 2 = {...}, x=..., max_order=1) at src/process_moc.cpp:697
#1  0xfff8000113b33f98 in ascii_out (m_s="", s=0x7fefff73c98 "", moc=0x10000ac1128, begin=72, end=88, entry_size=16)
    at src/process_moc.cpp:749
#2  0xfff8000113b344d0 in create_moc_out_context (moc=0x10000ac1128, end=88,
    error_out=0xfff8000113b0ec14 <moc_error_out>) at src/process_moc.cpp:791

SIGBUS means unaligned access:

(gdb) p x
$1 = (const moc_interval &) @0x10000ac1174: {first = 72057594037927936, second = 144115188075855872}

(gdb) l
692     order_break(output_map & outputs, const moc_interval & x, int max_order)
693     {
694             int order;
695             hpint64 mask = 0;
696             mask = ~mask ^ 3;
697             hpint64 first   = x.first >> 2 * (29 - max_order);
698             hpint64 second = x.second >> 2 * (29 - max_order);
699             for (order = max_order; order > 0; --order, first >>= 2, second >>= 2)
700             {
701                     if (second == first)

(gdb) f 1
#1  0xfff8000113b33f98 in ascii_out (m_s="", s=0x7fefff73c98 "", moc=0x10000ac1128, begin=72, end=88, entry_size=16)
    at src/process_moc.cpp:749
749                     order_break(outputs, *interval_ptr(moc, j), order);
(gdb) l
744             {
745                     // page bumps
746                     int32 mod = (j + entry_size) % PG_TOAST_PAGE_FRAGMENT;
747                     if (mod > 0 && mod < entry_size)
748                             j += entry_size - mod;
749                     order_break(outputs, *interval_ptr(moc, j), order);
750             }
751             for (int k = 0; k <= order; ++k)
752             {
753                     const moc_map & output = outputs[k];

As seen above, the x address is only 4-aligned, not 8.

The reason is somewhere in *interval_ptr(moc, j) and how the offsets are computed.

static
moc_interval* interval_ptr(Smoc* moc, int32 offset)
{
    return data_as<moc_interval>(detoasted_offset(moc, offset));
}

static
char* detoasted_offset(Smoc* moc, size_t offset = 0)
{
    return offset + reinterpret_cast<char*>(moc) + offsetof(Smoc, version);
}

/*
 * this particular layout should prevent the compiler from introducing unwanted
 * padding
 */
typedef struct
{
    char        vl_len_[4]; /* size of PostgreSQL variable-length data */
    uint16      version;    /* version of the 'toasty' MOC data structure */
    uint8       order;      /* actual MOC order */
    uint8       depth;      /* depth of B+-tree */
    hpint64     first;      /* first Healpix index in set */
    hpint64     last;       /* 1 + (last Healpix index in set) */
    hpint64     area;       /* number of covered Healpix cells */
    int32       tree_begin; /* start of B+ tree, past the options block */
    int32       data_begin; /* start of Healpix intervals, bypassing the tree */
    int32       data[1];    /* no need to optimise for empty MOCs */
} Smoc;

My suspicion is that the offsetof should rather be hooked on data than version, and that the data field should be hpint64.

Since I don't want to redesign the Smoc struct, I'm stopping here.

df7cb commented 7 months ago

As said above, I don't expect any fixes - sparc64 is an old architecture only barely kept alive, so I'll close this immediately again.

esabol commented 7 months ago

I'd leave the issue open at least, but, yeah, I doubt there's much interest in fixing this issue.