transientskp / aartfaac-control

AARTFAAC control scripts
0 stars 0 forks source link

Pelican server startup issues with cmdclient #21

Closed hsuyeep closed 9 years ago

hsuyeep commented 9 years ago

Pelican server observed to fail on startup after being given a STOP command earlier. Could be related to a stale socket from the previous run. Example output (from nohup file of a server run). [INFO] [aartfaac-server-21609] Stream 'Stream0' 22:17:30-22:17:31 0.45 Gb/s [INFO] [aartfaac-server-21609] Stream 'Stream0' 22:17:31-22:17:32 0.42 Gb/s [INFO] [aartfaac-server-21609] Stream 'Stream0' 22:17:33-22:17:34 0.44 Gb/s Unhandled exception in thread started by <bound method pelicanServerCmdClient.threadhdlr of <main.pelicanServerC mdClient object at 0x1a80390>> --> [2015-07-06 22:19:40] Received connection from: ('10.149.96.3', 39067). --> [2015-07-06 22:19:40] Received: 0 STOP, len: 6. cmd: STOP <-- [2015-07-06 22:19:40] Terminating pid 21608. --> [2015-07-06 22:19:40] Received: 0 START --buffer-max-size 34359738368 --stream 63 57617187.5 3051.757812 0-62, len: 77. cmd: START <-- [2015-07-06 22:19:40] Running cmdstr: ['start_server.py', '--buffer-max-size', '34359738368', '--stream', '63', '57617187.5', '3051.757812', '0-62']. <-- [2015-07-06 22:19:40] Successfully started pid 22059 for cmd execution, status: OK. --> [2015-07-06 22:19:40] Received: , len: 0.

Invalid command protocol! Try again.

--> [2015-07-06 22:19:40] Received: , len: 0.

Invalid command protocol! Try again.

--> [2015-07-06 22:19:40] Received: , len: 0.

Invalid command protocol! Try again.

--> [2015-07-06 22:19:40] Received: , len: 0.

Invalid command protocol! Try again.

--> [2015-07-06 22:19:40] Received: , len: 0.

Invalid command protocol! Try again.

Traceback (most recent call last): File "/home/prasad/aartfaac-control/clients/cmdclient.py", line 99, in threadhdlr self._servsock.send(self._status); socket.error: [Errno 32] Broken pipe aartfaac (Release) ve5c483b (May 29 2015) [INFO] [aartfaac-server-22060] ----- Stream (4100) ------ [INFO] [aartfaac-server-22060] Subbands (1): [INFO] aartfaac-server-22060 (63) chunksize 41948960 bytes [INFO] [aartfaac-server-22060] Channels in stream(63): [INFO] [aartfaac-server-22060] Frequency ref: 57617187.500000 [INFO] [aartfaac-server-22060] Channel width: 3051.757812 [WARNING] [aartfaac-server-22060] [virtual QIODevice* StreamChunker::newDevice()] Created new connection 0.0.0.0:4100 PelicanServer caught an error: Cannot run PelicanServer on port 2000 Unhandled exception in thread started by <bound method pelicanServerCmdClient.threadhdlr of <main.pelicanServerCmdClient object at 0x1a80390>>

hsuyeep commented 9 years ago

Noted that the broken pipe error was due to the service.py close()ing the socket after receiving an 'OK' from cmdclient, while cmdclient was still in the process of sending more statuses for the list of commands in the cmdlist. Modified code to send a single status after the full block of commands has been executed. Currently untested.

hsuyeep commented 9 years ago

A new thread is created everytime a tcp connection is made to cmdclient. However, service.py close()es the connection after giving a start. When it connects again to give a STOP, another thread is created to handle the socket and this thread kills the subprocesses of the previous thread, causing the unhandled thread error. Workaround might be to have a state machine in a single thread to maintain states across socket connections. This dependds on whether the new connection can be used by the old thread (which is already doing a receive on an older socket).

hsuyeep commented 9 years ago

Since a new socket object is created with every connection, the old thread would not be able to access the old socket connection. The new thread can then signal a STOP by killing the subprocesses, and should also signal to the old thread to join().

hsuyeep commented 9 years ago

Fixed by commit 2f0d9c08