ros-infrastructure / buildfarm_deployment

Apache License 2.0
30 stars 39 forks source link

Pinning jenkins puppet module to 1.6.1 causes slave disconnect #143

Closed ayrton04 closed 7 years ago

ayrton04 commented 7 years ago

Our slaves went offline and we couldn't get them to re-appear. After investigation, I found the recent merge for pinning rtyler/jenkins to 1.6.1. As a test, I manually pinned the version to 1.7.0, reconfigured, and the slaves came back.

ayrton04 commented 7 years ago

Scratch that. I'm still having trouble getting our slaves to come back, but whether this change was the culprit is unclear.

ayrton04 commented 7 years ago

Sorry, another update: I can confirm that when I pin rtyler/jenkins to version 1.6.1, my slaves go offline. When I manually change the configuration by pinning it to 1.7.0, they return. I was able to repeat this several times. I take it this hasn't been an issue for anyone else?

ayrton04 commented 7 years ago

OK, an update: our slaves were suddenly also appending unique IDs to the ends of their names (including slave_on_master, as I re-ran the configuration on it). Since all the jobs are looking for slave_on_master and not slave_on_master_8f302141, none of our jobs were firing. I was able to fix it by manually adding the -disableClientsUniqueId flag to the swarm command. Something is out of sync somewhere, so I'll do some digging. In the meantime, I'm guessing it's best for everyone to stick to 1.6.1. I'll close this and re-open if I find anything else out.

tfoote commented 7 years ago

Thanks for the updates @ayrton04