xqms / rosmon

ROS node launcher & monitoring daemon
Other
180 stars 47 forks source link

add auto-increment-spawn-delay feature #175

Closed 1r0b1n0 closed 1 year ago

1r0b1n0 commented 1 year ago

Hi, We are using rosmon on some big launch files (> 100 nodes). Launching all the nodes simultaneously gives us some problems on startup. I added an option to auto increment the spawn delay of all the nodes (except those having already a spawn-delay set).

For example, with --auto-increment-spawn-delay=1 Node A will launch at 0s Node B will launch at 1s Node C will launch at 2s

xqms commented 1 year ago

Wow, I think you have the new highscore in terms of number of nodes I have seen so far ;)

In some projects we have split things into separate rosmon instances to keep things manageable...

Anyway, I like your proposal and will take a detailed look at your changes soon. From a first glance: I would prefer a separate pass over the nodes after the launch file has loaded instead of integrating the logic into the parseNode() function. That way, we don't need the currentAutoIncrementSpawnDelay member and the logic should be much easier to read.

1r0b1n0 commented 1 year ago

I did rewrite the code a bit to separate the logic in another method, it feels a bit cleaner now !

Ah I'm not very proud of this highscore of number of nodes. Rosmaster (unfortunately written in python) is having a hard time keeping up with all our nodes.

MCFurry commented 1 year ago

Brilliant idea! We're currently testing this branch as well since it seems some lower-performance machines also struggle on starting our 80+ nodes system.

xqms commented 1 year ago

Pushed some small style changes. I'll merge this as soon as CI completes :)

xqms commented 1 year ago

@1r0b1n0 by the way, we use roscore with arguments roscore -w 20 -t 8 in our systems with a lot of nodes. That reduces roscore-related lags significantly (use 8 threads and a timeout of 20s if a node is not reachable).

MCFurry commented 1 year ago

@xqms Do you mean 20s timeout or 20 workers? According to roscore -h the -w flags the number of workers and -t the timeout? Thanks for the info though, I never played with those flags before so might be worth investigating as well!

xqms commented 1 year ago

Erm yes, the other way around :D

I'll go ahead and merge this now.

1r0b1n0 commented 1 year ago

@xqms thank you for the roscore options tip, I'll try it out