ros-infrastructure / buildfarm_deployment

Apache License 2.0
30 stars 39 forks source link

master deployment: slaves can't connect due to "JNLP TCP port" set to "disabled" #177

Closed gavanderhoorn closed 5 years ago

gavanderhoorn commented 6 years ago

As per subject really.

Not sure whether this is due to configuration by puppet not completing because of other issues, but none of my slaves/agents can connect initially.

Errors seen: "slaveAgentPort.disabled Go to security configuration screen and change it."

Indeed after a master reconfigure the TCP port for JNLP agents is set to Disable in the Configure Global Security panel of the master configuration.

gavanderhoorn commented 6 years ago

Setting this to Random, saving and restarting Jenkins makes all slaves come up.

inigomartinez commented 6 years ago

Setting this to Random, saving and restarting Jenkins makes all slaves come up.

When I tried this, the slaves disappeared 😞.

gavanderhoorn commented 6 years ago

I had to restart the slaves after restarting Jenkins.

gavanderhoorn commented 6 years ago

@inigomartinez: did restarting the slaves work for you?

nuclearsandwich commented 6 years ago

There's a new resource that will be introduced by #183 for running Groovy scripts via puppet.

http://javadoc.jenkins.io/jenkins/model/Jenkins.html#setSlaveAgentPort-int- and http://javadoc.jenkins.io/jenkins/model/Jenkins.html#setAgentProtocols-java.util.Set- are the API methods of relevance.

In the interest of secure defaults, even as they deviate from build.ros.org I think it would be best to try and only enable the v4 agent protocol (which uses TLS for secure communication).

inigomartinez commented 6 years ago

@inigomartinez: did restarting the slaves work for you?

No, it didn't worked just restarting, but the issue is fixed now. I didn't realize the JNLP2 option was disabled, so after enabling it, the slaves connected seamlessly :smiley:.

gavanderhoorn commented 6 years ago

Ah, ok.

Simply restarting without changing the setting was not going to solve it no :)

gavanderhoorn commented 6 years ago

@nuclearsandwich: just brought up a master deployment and without changing anything only the Java Web Start Agent Protocol/4 (TLS encryption) option is enabled.

gavanderhoorn commented 6 years ago

Hm. Now I understand @inigomartinez's https://github.com/ros-infrastructure/buildfarm_deployment/issues/177#issuecomment-359743558: Jenkins 2.89.3 appears to by default disable all slave protocols except version 4 with TLS.

The slaves/agents seem to only support JNLP2, and can't connect now.

gavanderhoorn commented 6 years ago

Only agents v3.3 and up seem to support JNLP4, but that jar doesn't seem to want to cooperate when I drop it in-place of the v2.2 that is currently deployed.

(is there some swarm plugin doc I'm missing that details which protocols are supported? Cause it's really hard to find)

gavanderhoorn commented 6 years ago

Unfortunately enabling JNLP2 to get the agents to connect results in Jenkins (2.89.3) to display the following warning:

This Jenkins instance uses deprecated protocols: JNLP2-connect. It may impact stability of the instance. If newer protocol versions are supported by all system components (agents, CLI and other clients), it is highly recommended to disable the deprecated protocols.

nuclearsandwich commented 6 years ago

Until we have a chance to test an updated swarm plugin I think we'll have to enable JNLP2 as well and live with the warning.

jonazpiazu commented 6 years ago

Only agents v3.3 and up seem to support JNLP4, but that jar doesn't seem to want to cooperate when I drop it in-place of the v2.2 that is currently deployed.

I am not sure if it is still relevant, but I just tried to "drop in place" the latest available version for the agent, which is v3.4 and it seems to be working nicely.

Using Jenkins version 2.89.2 and JNLP4.

nuclearsandwich commented 5 years ago

One of the changes in #207 was a forked version of the jenkins module we're using to allow proper installation of more recent swarm client versions. The version is now specified in the buildfarm_deployment_config, here's the relevant entries in the example config.

I believe that with the move to more recent swarm clients and Jenkins' current defaults that no changes are needed for a secure instance with functioning swarm agents, so I'll close this one. But if someone is still having issues I am happy to re-open and/or help them investigate.