jaiarobotics / jaiabot

Jaiabot source code
Other
20 stars 2 forks source link

task/increase-speed-bot-and-hub-connect #866

Closed tsaubergine closed 2 months ago

tsaubergine commented 3 months ago

Overview

This PR substantially improves (reduces) the time from bot power-on to first BotStatus message.

This PR is based off https://github.com/jaiarobotics/jaiabot/pull/802 (task/data-pull) since I needed the Virtualbox support for docker-arm64-build-and-deploy.sh to test this PR (and I expect that one to be merged imminently based on the last sprint meeting).

I labelled this high risk as no one is 100% sure why the sleeps were in the systemd start up to begin with, so taking them out has some risk. I know at least one reason was fixed in Goby 3.0.13 (https://github.com/GobySoft/goby3/releases/tag/3.0.13: "Fix bug in goby_moos_gateway where subscriptions won't happen after MOOS is connected"). However, I don't know for sure that that was the only reason.

This PR requires Goby 3.1.5 which fixes some DynamicBuffer issues. packages.jaia.tech staging has been updated to Goby 3.1.5.

Speed up changes

I sped up start of bots by:

Most of the remaining delay in start up (besides actually booting the system) is in waiting for an NTP sync (the time this takes seems to vary quite a bit even on the same system).

Other changes in this PR

Testing

I tested this in a 10-bot 1-hub VirtualBox fleet. I modified the VirtualBox CPU settings to underclock certain bots and remove cores from the default 4 in other bots in an attempt to expose any potential remaining race conditions that might not otherwise exhibit on my desktop computer.

After several reboots I saw no new problems, and the time from pressing "Reboot Bots" in the Fleet Upgrade GUI to first status is 60-120 seconds now (where this variance is all in the NTP sync timing), rather than about twice this time prior to the changes in this PR.

michael-jaia commented 2 months ago
task/increase-speed-bot-and-hub-connect
Testing with Fleet 1 (3 Bots)
1. 2:56 // Slowest time
2. 1:52 - 2:16 - 3:03
3. 1:55 - 2:21 - 3:10
4. 1:30 - 1:45 // JCC Reboot (range for all three to come online)
5. 1:25 - 1:30 - 1:35 // JCC Reboot

*****************************************************************

Release
Testing with Fleet 1 (3 Bots)
1. 2:20 - 2:20 - 3:20
2. 2:14 - 2:30 - 3:25
3. 2:34 - 2:59 - 3:20
4. 2:50 - 2:50 - 3:10 // JCC Reboot
5. 2:21 - 2:21 - 2:37 // JCC Reboot

*****************************************************************

Bench testing results (sorry it was not the most scientifically structured test). The non-JCC Reboot tests show about 20-30 seconds of improvement because I did not disable the acquire GPS requirements. The JCC Reboot tests more clearly show the reduction in startup time for the bots (~60-90s faster).