Closed claraberendsen closed 1 year ago
@nuclearsandwich I have a draft of the process here. I need some insight on how to test this... I have a separate PR that I think should go first to this one, that adds the necessary plugins for this to work. At the moment I created a test job in the create_jobs.py file to test that the configuration is correct. However I'm not certain on how to build that and run it, since I don't know when and where is this file being run. Would appreciate your insight.
I have a separate PR that I think should go first to this one, that adds the necessary plugins for this to work.
Can you link / reference that here. As discussed in the Infrastructure meeting we're blocked on Jenkins plugin upgrades generally but that's an important item to address in the near future.
I need some insight on how to test this...
I'm in favor of just hand-hacking the test_ci_linux job to display the error string and exit, rather than adding a separate test job just for this. The jobs get re-configured on every deploy and I don't think there's currently enough of a justification to create job for doing those kinds of tests.
At the moment I created a test job in the create_jobs.py file to test that the configuration is correct. However I'm not certain on how to build that and run it, since I don't know when and where is this file being run. Would appreciate your insight.
This script is generally run by ROS 2 devs locally after merging a PR and making a change. Running it requires Jenkins admin configuration and a configured python environment. There are details and links to further details in the readme.
Closing this since we have tracked the actual error and this strategy is not the appropriate solution
Description
This PR adds the functionality for recovering the agent by restarting it when an specific error occurs on the log. In particular the case that was applied is
error waiting for container: unexpected EOF
.