Open angeloxx opened 6 years ago
@JMesser81 @daschott @madhanrm - FYI
@angeloxx thank you for reporting this. I wonder if this is related to https://github.com/Azure/acs-engine/issues/3501
Thank you @daschott, but I think the issue is not related to that acs-engine issue; we're working in a on-premise environment and the issue is related to outgoing connection to an external service, outside the kubernetes cluster network. Outgoing packets reports the correct source ip address, but the assigned source port is not closed properly or reused too early without waiting the last FIN/FIN_ACK
Thanks @angeloxx. This issue I linked is a Kubernetes-specific feature that we're working on for the next Windows release. It sounds like what you are describing is a perf/concurrency issue that we need to investigate in more detail. But we have some other issues we are working through first in order to GA. Your issue is on our list of bugs though for tracking, which we are working through systematically.
You can also try to use the win-bridge plugin in the meanwhile (ETA for validation+new docs coming by the end of the month) to see if it helps at all. We had some binary changes recently.
Any update on this issue? We have the exact same behaviour with Calico CNI
I want to notify a problem with the Windows Source NAT used to manage the outgoing Container traffic.
Environment
Symptom
randomly some connections are dropped by the segmentation firewall (Stonegate, configured with Normal connection tracking mode -- see http://help.stonesoft.com/onlinehelp/StoneGate/SMC/5.7.0/SGAG/SGOH_AdvancedEngineSettings/Adjusting_Firewall_Traffic_Handling_Parameters.htm) because the device reports the reuse of a not-closed connection. Usually the error is related to a recurrent source TCP port; this behaviour remember me two already known issues:
https://social.msdn.microsoft.com/Forums/en-US/876f67de-1c15-4cce-beae-c9b47609f75d/app-in-container-throw-connection-errors-sometimes-when-using-nat-networkwindows-server-2016?forum=windowscontainers. In my case the involved connections starts from the same container to the same destination host/port (it is generated by the container's readiness check) and recurs every 5 seconds. If I generate the same traffic from the host OS (bypassing the NAT) the problem doesn't occur.
similar issue occurs with a customer that used outgoing Azure NAT service with multiple Windows VM; in that cases our F5 external firewall had reported the usage of a source port already in use for a not-closed connection (https://blogs.msdn.microsoft.com/mast/2015/07/13/azure-snat/)
In both cases the problem seems to be the short range of source ports used to NAT the source traffic. Current workaround consist to setup the connection tracking on the Stonegate to "Loose" mode.