Crashes Related to Connections

tsawyer commented 7 years ago

My nodes crash way too often. A couple of weeks up time is about the best I generally get. A random (non scientific) survey of the stats page reveals similar up times for many nodes. I've had a theory that poor IAX connections have something to do with that. But it was hard to prove, until I found out there is a way to simulate poor network connectivity with the *CLI> iax2 test losspct .

I set up a test bench with three nodes, one acting as a hub for the other two. After connecting with permanent connections I introduced packet loss with the above iax2 CLI. This would generally crash one of the nodes fairly quickly. This would lead one to believe that there is an issue with reconnects.

To test that, I set up a couple of scripts to repeatedly perform regular (not permanent) connects and disconnects. This too will crash a node in rather short order, say within 30 minutes or so. This will cause a crash even without introduced packet loss.

I am fairly convinced there is something, perhaps some connection routine not cleaning up properly, that is causing app_rpt to be much less stable than Asterisk itself. I hope this helps you smart programmers find and solve this up time issue.

KG7QIN commented 7 years ago

The IAX2 module used in the forked version of Asterisk has been modified slightly from the original. There was an attack against the 1.4 branch's IAX2 channel driver that would cause resource exhaustion. The solution was to use call tokens to fix this (if memory serves me correctly).

While working on my now alpha port of this to 1.8, I was having problems with the 1.8 node negotiating connections with 1.4 nodes. At the time, I was talking with Jim Dixon about this, and he pointed me in the direction to remove the call token feature of the IAX2 channel driver for backwards compatibility with the Allstar version.

The IAX2 driver in Allstar has had some patches applied to it, but what it really needs is to be brought back up to spec with regards to the Asterisk security advisories/patches for the 1.4 version of the Asterisk. This will probably fix it.

KG7QIN commented 7 years ago

For the heck of it, I pulled down the original tarballs for 1.4.32 (the original, .1 and .2) along with the last version of 1.4 - 1.4.44, and ran a diff between chan_iax2.c and the ones contained in each. The IAX2 driver has the call token patches from later on in the 1.4.x line.

One thing that is definitely different between Allstar's char_iax2.c and the one from 1.4.44 is the code dealing with locks. It appears to have been overhauled in the 1.4.44 release.

Also, how the timing calls for DAHDI are different. In Allstar's iax2 driver, the /dev/zap/timer file is opened. In 1.4.44's iax2 driver, an ioctl call is made instead.

I have attached the unified diff I did in case anyone is interested in the changes between the versions.

iax2.1.44.diff.txt

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

tsawyer commented 4 years ago

That will fix it. Ignore the problem long enough and it will go away... stale bot magic!

KG7QIN commented 4 years ago

Did you not read what the status was changed to? Instead of leaving snide remarks in the issues section I suggest you take a look at what the labels mean first.

This was changed from being auto tagged as wontfix to pinned. And since you've proven unable to check the labels section to see what the pinned label means:

Pinned - Keeps stale issues from being auto closed

Please refrain from submitting further comments of this nature in the issues section here as they do nothing to further development.

I'm hiding your comment as it does not provide a meaningful contribution to this issue.

Thank you.

tsawyer commented 4 years ago

I think it was pinned after my comments. Thanks for keeping it active.

On Sun, May 10, 2020 at 2:58 AM Stacy Olivas notifications@github.com wrote:

Did you not read what the status was changed to? Instead of leaving snide remarks in the issues section I suggest you take a look at what the labels mean first.

This was changed from being auto tagged as wontfix to pinned. And since you've proven unable to check the labels section to see what the pinned label means:

Pinned - Keeps stale issues from being auto closed

Please refrain from submitting further comments in the issues section here as they do nothing to further development.

Thank you.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/AllStarLink/Asterisk/issues/17#issuecomment-626302391, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALUUWVNXCYRKF7SHYRSFQTRQZ3D5ANCNFSM4DBLKBZA .

-- Tim WD6AWP

pttlink / Asterisk

Crashes Related to Connections #17