paulscherrerinstitute / StreamDevice

EPICS Driver for message based I/O
GNU General Public License v3.0
28 stars 42 forks source link

IOC hangs on exit #41

Closed dirk-zimoch closed 3 years ago

dirk-zimoch commented 5 years ago

Mail from Mark Rivers:

It looks like the problem is these 2 threads:

Thread 5 (Thread 0x7fffedfe9700 (LWP 30806)):

0 0x00007ffff5be38ed in connect () from /lib64/libc.so.6

1 0x00007ffff76dddb4 in connectIt (pasynUser=0x713308, drvPvt=0x71def0) at ../../asyn/drvAsynSerial/drvAsynIPPort.c:476

2 asynCommonConnect (drvPvt=0x71def0, pasynUser=0x713308) at ../../asyn/drvAsynSerial/drvAsynIPPort.c:520

3 0x00007ffff76d08fb in portConnectProcessCallback (pasynUser=0x713308) at ../../asyn/asynDriver/asynManager.c:3076

4 0x00007ffff76d39c7 in portThread (pport=0x711ed0) at ../../asyn/asynDriver/asynManager.c:820

5 0x00007ffff67128ec in start_routine (arg=0x712f10) at ../../../src/libCom/osi/os/posix/osdThread.c:403

6 0x00007ffff5698e25 in start_thread () from /lib64/libpthread.so.0

7 0x00007ffff5be2bad in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7ffff7fd9740 (LWP 30798)):

0 0x00007ffff569f51d in __lll_lock_wait () from /lib64/libpthread.so.0

1 0x00007ffff569ae36 in _L_lock_870 () from /lib64/libpthread.so.0

2 0x00007ffff569ad2f in pthread_mutex_lock () from /lib64/libpthread.so.0

3 0x00007ffff6714b66 in mutexLock (id=0x712150) at ../../../src/libCom/osi/os/posix/osdMutex.c:46

4 epicsMutexOsdLock (pmutex=0x712150) at ../../../src/libCom/osi/os/posix/osdMutex.c:130

5 0x00007ffff76cf38b in lockPort (pasynUser=0x714d08) at ../../asyn/asynDriver/asynManager.c:1741

6 0x00007ffff76dcd66 in cleanup (arg=0x71def0) at ../../asyn/drvAsynSerial/drvAsynIPPort.c:246

7 0x00007ffff6708dd3 in epicsExitCallAtExitsPvt (pep=) at ../../../src/libCom/misc/epicsExit.c:95

8 epicsExitCallAtExits () at ../../../src/libCom/misc/epicsExit.c:113

9 0x00007ffff6709178 in epicsExit (status=0) at ../../../src/libCom/misc/epicsExit.c:181

10 0x0000000000405d0d in main (argc=, argv=) at ../srMain.cpp:21

Thread 1 is drvAsynIPPort trying to close the socket. It is hanging on the call to pasynManager->lockPort(), which tries to lock pport->synchronousLock. The reason is that thread 5 is the asyn port thread trying to connect to the device, and it has the mutex pport->synchronousLock locked. I suspect that thread 5 is trying to connect frequently because many records are talking to the device, and the device has Autoconnect=Yes. If that thread never unlocks the mutex for long then this problem will occur.

I need to try to reproduce this and see if there is a solution.

Next time it happens before you type "exit" please type "epicsMutexShowAll 1".

Mark

dirk-zimoch commented 5 years ago

I think I may need to stop I/O Intr polling when the IOC is about to exit.

MarkRivers commented 5 years ago

Abdalla said that the problem with exit only occurs when one of the devices is offline, i.e. drvAsynIPPort cannot connect. I have reproduced this. So I think the main (only?) problem is in asyn.

In debugging yesterday it seemed that what was happening was that the underlying Linux connect() call was:

dirk-zimoch commented 3 years ago

Cannot reproduce. In release 2.8.20, I stop I/O Intr polling when the IOC is exiting (and in iocStop). Maybe that helps. Please re-open if the problem returns.