Open alphaDev23 opened 4 years ago
Hi @alphaDev23
Yes, it has been tested. Sounds like a network instability issue in your environment? Has the Swarm been setup with a specific configuration?
We've found out that using the --advertise-ip
during the swarm init
and swarm join
steps when creating the steps led to more stability.
Could also be related to https://github.com/portainer/agent/pull/102 (in some high latency environments) that we're currently reviewing. That PR might help.
Swarm was set up using Openstack Magnum. 1.5.0 appears to be more stable. Is there a difference between 1.5.0 and 1.5.1 that accounts for network stability?
Separately, although it may be related, you mentioned that the agent was planning on moving to ingress ports but I see that the stack file defines host ports. Was there a reason that the later was chosen?
The only difference between 1.5.0 and 1.5.1 is the following bugfix: https://github.com/portainer/agent/issues/95
It only implies a change regarding the detection of the agent IP address at startup.
The agent now support ingress ports but we did not officially determined which mode is recommended yet. As such, we kept the old agent definition. Although it would only solve potential issues between Portainer and the agents. In your case, it seems that there is an issue in the overlay network as agents inter-communication fails.
ping @akomelj I wonder if this is the issue you encountered before working on #102 ?
@deviantony No, my symptoms were nothing like the ones described in #95. Agents had no problem discovering their IP addresses - probes between them were failing every few minutes due to high latency network and succeeding in between.
Yeah I meant symptoms similar to the one reported in this issue (see logs above).
@deviantony Huh, I guess you were asking for this issue and not #95. I skimmed through the logs above and yes - this is the exact same kind of behaviour I was observing. Failed acks and fallback pings, refuted suspects, etc.
Below is the stack file and full logs. Using portainer:1.23.0 as the server. Has the agent been tested?
cat agent-stack.yml