robotastic / trunk-recorder

Records calls from a Trunked Radio System (P25 & SmartNet)
GNU General Public License v3.0
827 stars 191 forks source link

Seemingly random core dumps. #950

Closed penguinzephyr closed 2 months ago

penguinzephyr commented 2 months ago

As per the title, TR seems to randomly crash out. I rebuilt TR from the 5.0 branch with debug symbols turned on and captured the below. Have attached the output of bt full.

bt_full.txt

System is Ubuntu 22.04.4 LTS running as a VM with a single hackRF passed in as its source.

Happens indeterminately - this time was after 820 transmissions, earlier i got to ~90 before it crashed.

The TR error log doesnt display anything wrong - Simply restarts and throws the initialization sequence logs.

taclane commented 2 months ago

I've been trying to track this down too. Proximate cause appears to be: lib/op25_repeater/lib/p25p1_fdma.cc

                if ((sap == 61) && ((fmt == 0x17) || (fmt == 0x15))) { // Multi Block Trunking messages
                    if (blks > deinterleave_buf.size()
                        return; // insufficient blocks available

                    uint32_t crc1 = crc32(deinterleave_buf[1].data(), ((blks * 12) - 4) * 8); 

The attempt to access deinterleave_buf[1] if the zero-indexed array only has one item is throwing the assert error. The core dumps I saw were identical to this.

An edit to the effect of for that second line might prevent it in the future.

if ((blks > deinterleave_buf.size()) || (deinterleave_buf.size() == 1))
robotastic commented 2 months ago

Nice work!! I will work on getting this upstreamed into OP25 too

taclane commented 2 months ago

Reception on one of my control channels has been flakier than usual, so my going theory that it is probably just a receive issue and fragments of the MBT occasionally get lost.

But the assertion error does seem to be caused by an assumption that when a MBT header block is validated, there must always be additional data in the deinterleave_buf to decode.

If it weren't throwing this assertion error, it would probably end up ignoring the MBT anyway, as it is very unlikely that the random data in deinterleave_buf[1] would pass the crc checks.

penguinzephyr commented 2 months ago

Confirming the fixes applied in #951 appear to have fixed this issue and it appears to now be stable.Thanks!