org-arl / unet-contrib

Unet user contributions
BSD 3-Clause "New" or "Revised" License
11 stars 15 forks source link

Python API can hang on send #58

Closed impala454 closed 3 years ago

impala454 commented 3 years ago

Hi, this is Chuck (from Houston Mechatronics).

I'm using the Python API (unetpy) and having an issue where the send() function can hang at an exception I can't catch. My use case is a service loop which sends a message and I'd like the loop to monitor the state of the connection to the modem and reconnect if necessary. The setup is Ubuntu 20.04 with unet-3.2.0. Steps to recreate:

simple python test script with service loop:

import unetpy
import time
unet_sock = None
reconnect = True

while True:
    print('start of loop')
    if not unet_sock:
        reconnect = True
    elif unet_sock.isClosed():
        reconnect = True
    if reconnect:
        try:
            unet_sock = unetpy.UnetSocket('127.0.0.1', 1101)
        except Exception as ex:
            print('failed to connect to modem')
            # Don't destroy the CPU.
            time.sleep(0.1)
            continue
        reconnect = False
        print('connected to modem')
    msg = unetpy.DatagramReq()
    msg.data = b'hello'
    msg.to = 31
    msg.reliability = False
    print('this send will hang')
    if unet_sock.send(data=msg):
        print('message sent')
    else:
        print('failed to send')
    time.sleep(1)

Then run the example 2 node network (from /opt/unet-3.2.0):

bin/unet samples/2-node-network.groovy

If I start the sim, then start my script, it connects and begins sending, as expected. If I then stop the sim, the unet_sock.send() call hangs along with an exception:

Exception: [Errno 107] Transport endpoint is not connected

If, while it is hung there, I start the sim back up, it will continue on as normal, which is great, and means something (fjagepy?) is handling it, but I would like some way for my service loop to continue. If that Exception were bubbled up I could catch it and handle it, or if at least if the send() would go ahead return False the loop could continue. I fully accept that I may be holding it wrong and and open to suggestions on better ways to achieve this if they exist.

NOTE: I have not tested this on the physical modem hardware (yet).

prasadtiru commented 3 years ago

Thanks, Chuck for bringing this up. The code you have written makes sense for what you are trying to do.

fjagepy does try to reconnect to the master container when it goes offline and it is intended behavior. However, we do not need to do this inside the send() method. Therefore, we have fixed this now in the following PR where send() would return False when the master container is unavailable and this change will be part of the next release of fjagepy.

impala454 commented 3 years ago

Awesome, thanks!