Open bilgehan-erman opened 2 years ago
In the provided test results, (unintentionally) both the nsc and the nse nodes provide the "icmp-responder" service (that was originating from the default Config). We corrected this after the logs were captured. Same result. Still get the error. Also in the actual topology configuration there are no redundant service offers. Although each node provide and consume services, each service name is unique, based on the node id.
/cc @glazychev-art , @anastasia-malysheva May this be related to monitor OPA staff?
Thanks for the detailed information!
Most likely yes, it is related to OPA for monitoring. But it is not actually an error if you are not using init-container (cmd-nsc-init). We probably need to rewrite the error message in order not to mislead anyone.
@glazychev-art Any idea what the root cause might be? I'm not entirely sure why we would be seeing this, do you have a more specific idea?
@edwarnicke cmd-nsc does Monitor connections before the Request. This is necessary to take the connection if there was cmd-nsc-init container before. But as you know, you've implemented an open-policy for monitoring, and it is based on the spiffieID from the Request. But if we did not have an init container, then there were no Requests either. And an authorization error is returned.
But here the problem is different - as I see from the logs, there are many healing errors:
Aug 15 22:00:44.223[33m [WARN] [id:nsc-858c5dc57-2bf6l-0] [heal:eventLoop] [type:networkService] [0m(7.1) Data plane is down
Aug 15 22:00:44.223[37m [DEBU] [id:nsc-858c5dc57-2bf6l-0] [heal:eventLoop] [type:networkService] [0m(7.2) Reconnect with reselect
need to figure out why this is happening
That's an interesting scenario where we're trying to run nsc/nse together in the same container.
@bilgehan-erman
NSM_ LIVENESS_CHECK_ENABLED=false
)@denis-tingaikin
Unfortunately, setting the LIVENESS to false did not seem to help: nsc.log nse.log
Did ping work for your scenario?
At the NSC, cannot get to a point to try anything because of the error.
Do you have a diagram/scheme/proposal what do you finally get?
This is the test scenario:
And this would be an example of building random topologies using these universal nodes:
@bilgehan-erman Sorry to keep you waiting
I looked at your setup and logs, and I think I understand what's going on. The main reason is that the NSC is trying to connect to itself (to its endpoint). By the way, I'm not sure that this is possible due to routing... But you need a different scenario. I think selectors can help you with this. I prepared an example according to your last picture:
When I say nsc
I mean nsc+nse on the same pod (like your "node").
I did not make new image, I just added the client and endpoint as different containers in one pod. Most likely yamls can be simplified, I just want to show the idea:
NSM_LABELS: "dst_endpoint:node*"
)kernel://icmp-responder/nsm-1-2?dst_endpoint=node2
). Due to the selectors and labels, we will be able to select the desired endploint.So, to try you need:
kubectl create ns ns-topology
nsc_nse_setup.zip
nsc_nse_setup.zip (I used the main branch, but I think it will work on 1.5.0 too)
I really hope this helps!
@glazychev-art thank you very much for looking into this. I'll try it out your suggestions and see how it goes.
The setup is build on
The platform is k8s 1.23, docker 20.10, on ubuntu 20.04, NSM 1.5.0
Things work as expected with the existing scripts -- NSC and NSE on their own, separate pods. (See related Issue #7051)
However, when we try to build NSC and NSE into the same container, we keep getting the NSC error:
[ERRO] [cmd:[/bin/nsc]] error from monitorConnection stream %!(EXTRA string=rpc error: code = PermissionDenied desc = no sufficient privileges)
Could not find any more information on the source of the error.
(It may not be related but it is very difficult for us to understand the whole spire/nsm integration, how things get worked into the containers, etc. We are really looking forward for some documentation on this.)
The Dockerfile that builds the node that has both the NSC and NSE is configured as:
The use case is to build topologies with nodes that have both NSC and NSE dynamic capabilities; many nodes with a single container; payload ETHERNET. Therefore, workarounds like sidecars, separate nsc/nse roles are not workable options.
Any help will be very much appreciated. Thank you in advance.
nsc-rpc-auth-problem.txt nsc.log nse.log kubectl-get-pods.txt registry-k8s.log forwarder-vpp.log forwarder-sriov.log nsmgr-nsmgr.log nsm-test-setup.zip