tiiuae / mesh_com

ROS node for Mesh Network configuration
BSD 3-Clause "New" or "Revised" License
16 stars 17 forks source link

CBMA MDM Multiprocessing #377

Closed pentestiing closed 9 months ago

pentestiing commented 9 months ago

Behold fellow reader, the promised PR!

This PR includes some of the old problematic changes (now fixed) discussed in #337 alongside the multiprocessing additions in the old way of calling CBMA and the fixes implemented along the way (all of this squashed in the oldest commit for brevity but still accessible under https://github.com/tiiuae/mesh_com/tree/cbma-mdm-integration/). Another PR will follow this one with the error handling functionality.

Then, this PR also includes individual commits concerning the MDM agent side, which are more relevant for the current state of development.

Have fun! :star2:

saauvine commented 9 months ago

Why are using 60s long socket timeouts? I'm worried does it cause that cbma processes stays alive that long time after trying to terminate them because long timeouts in blocking calls are affecting how frequently shutdown_event checks are performed.

I tried to test that theory and run into problems when I tried to stop MDM agent after it was started with CBMA feature on. The problem I saw was that all the processes/threads were not stopped. Log indicates problem in /multicast.py", line 34 "OSError: no interface with this name". I did monitor processes/threads using htop via SSH connection (using usb0).

I suppose that this OS error is a consequence of running __cleanup_cbma method on MDM agent side right after trying to terminate cbma processes. => I tried to use process.kill() instead of process.terminate() in MDM agent's stop_cbma() method and it seemed to help for that particular problem but I think could be better to use shorter timeouts instead.

pentestiing commented 9 months ago

Why are using 60s long socket timeouts? I'm worried does it cause that cbma processes stays alive that long time after trying to terminate them because long timeouts in blocking calls are affecting how frequently shutdown_event checks are performed.

I tried to test that theory and run into problems when I tried to stop MDM agent after it was started with CBMA feature on. The problem I saw was that all the processes/threads were not stopped. Log indicates problem in /multicast.py", line 34 "OSError: no interface with this name". I did monitor processes/threads using htop via SSH connection (using usb0).

I suppose that this OS error is a consequence of running __cleanup_cbma method on MDM agent side right after trying to terminate cbma processes. => I tried to use process.kill() instead of process.terminate() in MDM agent's stop_cbma() method and it seemed to help for that particular problem but I think could be better to use shorter timeouts instead.

I made the tests once again and for now we cannot go less than 60 seconds for the client and server, otherwise the secure channel for the upper CBMA macsec key exchange is closed and the bridge never established. You can reduce it and it might work sometimes but not consistently. This will be fixed once the changes I mentioned in a comment above are added to CBMA :sandwich: