networkservicemesh / deployments-k8s

Apache License 2.0
42 stars 34 forks source link

cmd-nsc-vpp: issues with NOT default NSM_NAME #1826

Open richardstone opened 3 years ago

richardstone commented 3 years ago

Hi!

When I set the NSM_NAME parameter of cmd-nsc-vpp to anything other than the default, I get this error: Jun 17 16:19:42.714 [ERRO] [cmd:/bin/cmd-nsc-vpp] (19.1) proxyListener unable to listen on /tmp/memifproxy/endpoint-nsc-795886dc88-577t6-f96ec20f-ede0-499f-b0cc-819e8f735869/memif.socket: listen unixpacket /tmp/memifproxy/endpoint-nsc-795886dc88-577t6-f96ec20f-ede0-499f-b0cc-819e8f735869/memif.socket: bind: invalid argument

The interface seems to be in place for a second, but there are no neighbors and the pod keeps restarting. vppctl show interface address local0 (dn): memif1/0 (up): L3 172.16.1.96/32

I followed this guide: https://github.com/networkservicemesh/deployments-k8s/tree/main/examples/use-cases/Memif2Memif If I don't set the NSM_NAME parameter, it works correctly.

Is it possible that the given name is not handled correctly somewhere, or could you help me on where to look for issues?

denis-tingaikin commented 3 years ago

By default, we are using the name of POD: https://github.com/networkservicemesh/deployments-k8s/blob/main/apps/nsc-memif/nsc.yaml#L29

denis-tingaikin commented 3 years ago

@richardstone Could you share your diff in the deployment?

richardstone commented 3 years ago

Yes that is exactly how i'd like to use, but if I uncomment the mentioned part, it does not work.

Here is my deployment:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nsc
  labels:
    app: nsc
spec:
  selector:
    matchLabels:
      app: nsc
  replicas: 1
  template:
    metadata:
      labels:
        app: nsc
    spec:
      serviceAccountName: endpoint-nsc
      containers:
        - name: {{ template "endpoint.name" . }}-nsc
          image: {{ template "endpoint.nsc.image" . }}
          imagePullPolicy: {{ .Values.images.nsc.pullPolicy }}
          env:
            - name: NSM_REQUEST_TIMEOUT
              value: 1m
            - name: SPIFFE_ENDPOINT_SOCKET
              value: unix:///run/spire/sockets/agent.sock
            # - name: NSM_NAME
            #   valueFrom:
            #     fieldRef:
            #       fieldPath: metadata.name
            - name: NSM_NETWORK_SERVICES
              value: {{ .Values.type.nsc }}://icmp-responder/nsm-1
            - name: NSM_DIAL_TIMEOUT
              value: "60s"
            - name: NSM_REQUEST_TIMEOUT
              value: "300s"
          volumeMounts:
            - name: spire-agent-socket
              mountPath: /run/spire/sockets
              readOnly: true
            - name: nsm-socket
              mountPath: /var/lib/networkservicemesh
              readOnly: true
      volumes:
        - name: spire-agent-socket
          hostPath:
            path: /run/spire/sockets
            type: Directory
        - name: nsm-socket
          hostPath:
            path: /var/lib/networkservicemesh
            type: DirectoryOrCreate
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: endpoint-nsc
denis-tingaikin commented 3 years ago

OK, thanks! Could you also share logs from the container?

richardstone commented 3 years ago

Here is the log: cmd-nsc-vpp.txt

denis-tingaikin commented 3 years ago

@d-uzlov Could you have a look at this issue ASAP?

d-uzlov commented 3 years ago

@richardstone could you provide info about the cluster and operation system you are using?

Also, which exact names did you test besides "endpoint-nsc"? Are the errors in logs the same for all of the names you tested? If not, could you post logs for those different cases? Could you provide logs of a successful run with default settings?

Am I understanding correctly that short names also don't work? Like, for example, just "nsc" as in the deployment config you posted.

I wasn't able to reproduce the issue with the name you provided. However, I was able to get the same error when the name is too long. On my system "too long" is not applicable to names like "endpoint-nsc-7f9c9cddc9-hjsk5", I had to add ~15 symbols to it to make name too long, but maybe the limit is different for your system.

d-uzlov commented 3 years ago

Here are the logs I get when I try to set the NSM_NAME to the value from your logs:

```log Jun 18 08:42:29.556 [INFO] [cmd:/bin/cmd-nsc-vpp] Setting env variable DLV_LISTEN_CMD_NSC_VPP to a valid dlv '--listen' value will cause the dlv debugger to execute this binary and listen as directed. Jun 18 08:42:29.556 [INFO] [cmd:/bin/cmd-nsc-vpp] there are 5 phases which will be executed followed by a success message: Jun 18 08:42:29.556 [INFO] [cmd:/bin/cmd-nsc-vpp] the phases include: Jun 18 08:42:29.556 [INFO] [cmd:/bin/cmd-nsc-vpp] 1: get config from environment Jun 18 08:42:29.556 [INFO] [cmd:/bin/cmd-nsc-vpp] 2: run vpp and get a connection to it Jun 18 08:42:29.556 [INFO] [cmd:/bin/cmd-nsc-vpp] 3: retrieve spiffe svid Jun 18 08:42:29.556 [INFO] [cmd:/bin/cmd-nsc-vpp] 4: create network service client Jun 18 08:42:29.556 [INFO] [cmd:/bin/cmd-nsc-vpp] 5: connect to all passed services Jun 18 08:42:29.556 [INFO] [cmd:/bin/cmd-nsc-vpp] a final success message with start time duration Jun 18 08:42:29.556 [INFO] [cmd:/bin/cmd-nsc-vpp] executing phase 1: get config from environment (time since start: 31.6µs) This application is configured via the environment. The following environment variables can be used: KEY TYPE DEFAULT REQUIRED DESCRIPTION NSM_NAME String cmd-nsc-vpp Name of Endpoint NSM_DIAL_TIMEOUT Duration 5s timeout to dial NSMgr NSM_REQUEST_TIMEOUT Duration 15s timeout to request NSE NSM_CONNECT_TO URL unix:///var/lib/networkservicemesh/nsm.io.sock url to connect to NSM_MAX_TOKEN_LIFETIME Duration 24h maximum lifetime of tokens NSM_NETWORK_SERVICES Comma-separated list of URL A list of Network Service Requests Jun 18 08:42:29.557 [INFO] [cmd:/bin/cmd-nsc-vpp] Config: &main.Config{Name:"endpoint-nsc-7f9c9cddc9-hjsk5", DialTimeout:5000000000, RequestTimeout:60000000000, ConnectTo:url.URL{Scheme:"unix", Opaque:"", User:(*url.Userinfo)(nil), Host:"", Path:"/var/lib/networkservicemesh/nsm.io.sock", RawPath:"", ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}, MaxTokenLifetime:86400000000000, NetworkServices:[]url.URL{url.URL{Scheme:"memif", Opaque:"", User:(*url.Userinfo)(nil), Host:"icmp-responder", Path:"/nsm-1", RawPath:"", ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}}} Jun 18 08:42:29.557 [INFO] [cmd:/bin/cmd-nsc-vpp] [duration:555.5µs] completed phase 1: get config from environment Jun 18 08:42:29.557 [INFO] [cmd:/bin/cmd-nsc-vpp] executing phase 2: run vpp and get a connection to it (time since start: 613µs) Jun 18 08:42:29.557 [INFO] Configuration file: "/etc/vpp/helper/vpp.conf" not found, using defaults Jun 18 08:42:29.558 [INFO] [cmd:/bin/cmd-nsc-vpp] [duration:1.6467ms] completed phase 2: run vpp and get a connection to it Jun 18 08:42:29.558 [INFO] [cmd:/bin/cmd-nsc-vpp] executing phase 3: retrieving svid, check spire agent logs if this is the last line you see (time since start: 2.3336ms) Jun 18 08:42:29.557 [INFO] [cmd:vpp] vpp[11]: clib_elf_parse_file: open `linux-vdso.so.1': No such file or directory Jun 18 08:42:29.557 [INFO] [cmd:vpp] vpp[11]: clib_sysfs_prealloc_hugepages:260: pre-allocating 64 additional 2048K hugepages on numa node 0 Jun 18 08:42:29.557 [INFO] [cmd:vpp] vpp[11]: buffer: vlib_physmem_shared_map_create: pmalloc_map_pages: failed to mmap 64 pages at 0x1000000000 fd 5 numa 0 flags 0x11: Cannot allocate memory Jun 18 08:42:29.557 [INFO] [cmd:vpp] Jun 18 08:42:29.557 [INFO] [cmd:vpp] vpp[11]: buffer: falling back to non-hugepage backed buffer pool Jun 18 08:42:29.557 [INFO] [cmd:vpp] vpp[11]: vat-plug/load: vat_plugin_register: oddbuf plugin not loaded... Jun 18 08:42:30.592 [INFO] SVID: "spiffe://example.org/ns/ns-mr7gh/sa/default" Jun 18 08:42:30.592 [INFO] [cmd:/bin/cmd-nsc-vpp] [duration:1.0340512s] completed phase 3: retrieving svid Jun 18 08:42:30.592 [INFO] [cmd:/bin/cmd-nsc-vpp] executing phase 4: create network service client (time since start: 1.0364296s) Jun 18 08:42:30.592 [INFO] [cmd:/bin/cmd-nsc-vpp] executing phase 5: connect to all passed services (time since start: 1.0365073s) Jun 18 08:42:30.593 [INFO] [cmd:/bin/cmd-nsc-vpp] (1) ⎆ sdk/pkg/networkservice/common/updatepath/updatePathClient.Request() Jun 18 08:42:30.594 [INFO] [cmd:/bin/cmd-nsc-vpp] (1.1) request={"connection":{"id":"endpoint-nsc-7f9c9cddc9-hjsk5-6c1ca3fd-01f3-4d6d-a3b1-c3e5b874122d","network_service":"icmp-responder"}} Jun 18 08:42:30.594 [INFO] [cmd:/bin/cmd-nsc-vpp] (1.2) request-diff={"connection":{"path":{"path_segments":{"+0":{"name":"endpoint-nsc-7f9c9cddc9-hjsk5","id":"endpoint-nsc-7f9c9cddc9-hjsk5-6c1ca3fd-01f3-4d6d-a3b1-c3e5b874122d"}}}}} Jun 18 08:42:30.594 [INFO] [cmd:/bin/cmd-nsc-vpp] (2) ⎆ sdk/pkg/networkservice/common/serialize/serializeClient.Request() Jun 18 08:42:30.594 [INFO] [cmd:/bin/cmd-nsc-vpp] (3) ⎆ sdk/pkg/networkservice/common/refresh/refreshClient.Request() Jun 18 08:42:30.594 [INFO] [cmd:/bin/cmd-nsc-vpp] (4) ⎆ sdk/pkg/networkservice/utils/metadata/metaDataClient.Request() Jun 18 08:42:30.595 [INFO] [cmd:/bin/cmd-nsc-vpp] (5) ⎆ sdk/pkg/networkservice/core/adapters/serverToClient.Request() Jun 18 08:42:30.595 [INFO] [cmd:/bin/cmd-nsc-vpp] (6) ⎆ sdk/pkg/networkservice/common/heal/healServer.Request() Jun 18 08:42:30.595 [INFO] [cmd:/bin/cmd-nsc-vpp] (7) ⎆ sdk/pkg/networkservice/common/clienturl/clientURLServer.Request() Jun 18 08:42:30.595 [INFO] [cmd:/bin/cmd-nsc-vpp] (8) ⎆ sdk/pkg/networkservice/common/connect/connectServer.Request() Jun 18 08:42:30.598 [INFO] [cmd:/bin/cmd-nsc-vpp] (9) ⎆ sdk/pkg/networkservice/utils/metadata/metaDataClient.Request() Jun 18 08:42:30.599 [INFO] [cmd:/bin/cmd-nsc-vpp] (10) ⎆ sdk/pkg/networkservice/core/next/nextClient.Request() Jun 18 08:42:30.599 [INFO] [cmd:/bin/cmd-nsc-vpp] (11) ⎆ sdk-vpp/pkg/networkservice/up/peerup/peerupClient.Request() Jun 18 08:42:30.599 [INFO] [cmd:/bin/cmd-nsc-vpp] (12) ⎆ sdk-vpp/pkg/networkservice/up/upClient.Request() Jun 18 08:42:30.599 [INFO] [cmd:/bin/cmd-nsc-vpp] [duration:189.1µs] [vppapi:WantInterfaceEvents] (12.1) completed Jun 18 08:42:30.599 [INFO] [cmd:/bin/cmd-nsc-vpp] (13) ⎆ sdk/pkg/networkservice/core/next/nextClient.Request() Jun 18 08:42:30.599 [INFO] [cmd:/bin/cmd-nsc-vpp] (14) ⎆ sdk-vpp/pkg/networkservice/connectioncontext/mtu/mtuClient.Request() Jun 18 08:42:30.599 [INFO] [cmd:/bin/cmd-nsc-vpp] (14.1) request-diff={"connection":{"context":{"MTU":9000}}} Jun 18 08:42:30.599 [INFO] [cmd:/bin/cmd-nsc-vpp] (15) ⎆ sdk-vpp/pkg/networkservice/connectioncontext/ipcontext/routes/routesClient.Request() Jun 18 08:42:30.600 [INFO] [cmd:/bin/cmd-nsc-vpp] (16) ⎆ sdk-vpp/pkg/networkservice/connectioncontext/ipcontext/ipaddress/ipaddressClient.Request() Jun 18 08:42:30.600 [INFO] [cmd:/bin/cmd-nsc-vpp] (17) ⎆ sdk/pkg/networkservice/core/next/nextClient.Request() Jun 18 08:42:30.600 [INFO] [cmd:/bin/cmd-nsc-vpp] (18) ⎆ sdk-vpp/pkg/networkservice/mechanisms/memif/memifClient.Request() Jun 18 08:42:30.600 [INFO] [cmd:/bin/cmd-nsc-vpp] (18.1) request-diff={"mechanism_preferences":{"+0":{"cls":"LOCAL","type":"MEMIF"}}} Jun 18 08:42:30.600 [INFO] [cmd:/bin/cmd-nsc-vpp] (19) ⎆ sdk-vpp/pkg/networkservice/mechanisms/memif/memifproxy/memifProxyClient.Request() Jun 18 08:42:30.600 [INFO] [cmd:/bin/cmd-nsc-vpp] (20) ⎆ sdk/pkg/networkservice/common/mechanisms/sendfd/sendFDClient.Request() Jun 18 08:42:30.600 [INFO] [cmd:/bin/cmd-nsc-vpp] (21) ⎆ sdk/pkg/networkservice/common/mechanisms/recvfd/recvFDClient.Request() Jun 18 08:42:30.600 [INFO] [cmd:/bin/cmd-nsc-vpp] (22) ⎆ sdk/pkg/networkservice/common/heal/healClient.Request() Jun 18 08:42:30.601 [INFO] [cmd:/bin/cmd-nsc-vpp] (23) ⎆ sdk/pkg/networkservice/common/null/nullClient.Request() Jun 18 08:42:30.601 [INFO] [cmd:/bin/cmd-nsc-vpp] (24) ⎆ api/pkg/api/networkservice/networkServiceClient.Request() Jun 18 08:42:31.584 [INFO] [cmd:/bin/cmd-nsc-vpp] (24.1) response={"id":"endpoint-nsc-7f9c9cddc9-hjsk5-6c1ca3fd-01f3-4d6d-a3b1-c3e5b874122d","network_service":"icmp-responder","mechanism":{"cls":"LOCAL","type":"MEMIF","parameters":{"inodeURL":"inode://1048793/163769"}},"context":{"ip_context":{"src_ip_addrs":["172.16.1.101/32"],"dst_ip_addrs":["172.16.1.100/32"],"src_routes":[{"prefix":"172.16.1.100/32"}],"dst_routes":[{"prefix":"172.16.1.101/32"}]},"MTU":9000},"labels":{"nodeName":"kind-worker"},"path":{"path_segments":[{"name":"endpoint-nsc-7f9c9cddc9-hjsk5","id":"endpoint-nsc-7f9c9cddc9-hjsk5-6c1ca3fd-01f3-4d6d-a3b1-c3e5b874122d","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20tc3lzdGVtL3NhL2RlZmF1bHQiLCJleHAiOjE2MjQwMDc2MzYsInN1YiI6InNwaWZmZTovL2V4YW1wbGUub3JnL25zL25zLW1yN2doL3NhL2RlZmF1bHQifQ.a2WOQb1X4EXBOBNLBF-8uWKp7kRqFC3W3XoF4mKt53LQmTyRqwEJgHq7TnoNYCMkTt6vyETTea6_3JKYMxH-Sw","expires":{"seconds":1624007636}},{"name":"nsmgr-jt54m","id":"2ffda5b0-2497-4d18-99b1-825e74a08c48","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20tc3lzdGVtL3NhL2RlZmF1bHQiLCJleHAiOjE2MjQwMDgzNDMsInN1YiI6InNwaWZmZTovL2V4YW1wbGUub3JnL25zL25zbS1zeXN0ZW0vc2EvZGVmYXVsdCJ9.OIm7VMS88AEE2goqvVAJj9N0kpZEkJ9mc9SoxYPNhsXszC7_khM6QdUYs2AA46_sUbie9QQ7fokYOIuQwXNuiA","expires":{"seconds":1624008343}},{"name":"forwarder-vpp-z49mx","id":"8508bbd8-bdd9-4a06-8f00-f5fc31803316","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20tc3lzdGVtL3NhL2RlZmF1bHQiLCJleHAiOjE2MjQwMDgzNDMsInN1YiI6InNwaWZmZTovL2V4YW1wbGUub3JnL25zL25zbS1zeXN0ZW0vc2EvZGVmYXVsdCJ9.n1wOTd6G0E8XzSLA5nvmdUAFB_M3IM2dsLUVUT8MbknGBTeiMjZS-IuWYrvm04Y5HeAX_w9njfY0pB-UZFdcwQ","expires":{"seconds":1624008343},"metrics":{"client_drops":"0","client_rx_bytes":"0","client_rx_packets":"0","client_tx_bytes":"0","client_tx_packets":"0"}},{"name":"nsmgr-jt54m","id":"6a37f647-0305-4be4-b5b8-3da627134fac","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9ucy1tcjdnaC9zYS9kZWZhdWx0IiwiZXhwIjoxNjI0MDA4MzQzLCJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20tc3lzdGVtL3NhL2RlZmF1bHQifQ.Yv1neG-lDEeLWvJKJQWVH1NOoJA3d-P9WwXZR5sLqD1nU6QZWNDda4jDzN9Jgh7xZ0DG1ioXf8htkPT7aQdZgQ","expires":{"seconds":1624007636}},{"name":"nse-memif-9b6679887-8xbj2","id":"bbef3225-f148-4013-8097-036fe58134e7","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20tc3lzdGVtL3NhL2RlZmF1bHQiLCJleHAiOjE2MjQwMDc2MzYsInN1YiI6InNwaWZmZTovL2V4YW1wbGUub3JnL25zL25zLW1yN2doL3NhL2RlZmF1bHQifQ.QAgt7lw4RyAi7qQN_p6nA0NBN8jt6lDz2GblGn58SfdqQVxg4auGSOfkc-IXaaf3kaYHFFEamZF-K4uhj7VfoA","expires":{"seconds":1624007636}}]},"network_service_endpoint_name":"c632436f-5d67-44f1-b984-0d13f91383c8-nse-memif-9b6679887-8xbj2","payload":"ETHERNET"} Jun 18 08:42:31.584 [INFO] [cmd:/bin/cmd-nsc-vpp] (21.1) response-diff={"mechanism":{"parameters":{"inodeURL":"file:///proc/1/fd/10"}}} Jun 18 08:42:31.585 [INFO] [cmd:/bin/cmd-nsc-vpp] (19.1) response-diff={"mechanism":{"parameters":{"inodeURL":"file:///tmp/memifproxy/endpoint-nsc-7f9c9cddc9-hjsk5-6c1ca3fd-01f3-4d6d-a3b1-c3e5b874122d/memif.socket"}}} time="2021-06-18T08:42:31Z" level=info msg="No subscription found for the notification message." msg_id=81 msg_size=19 time="2021-06-18T08:42:31Z" level=info msg="No subscription found for the notification message." msg_id=81 msg_size=19 Jun 18 08:42:29.557 [INFO] [cmd:vpp] vpp[11]: memif_plugin: clib_file_add fd 11 private_data 0 idx 4 Jun 18 08:42:29.557 [INFO] [cmd:vpp] vpp[11]: memif_plugin: clib_file_add fd 14 private_data 0 idx 5 Jun 18 08:42:31.595 [INFO] [cmd:/bin/cmd-nsc-vpp] (8.1) request-diff={"connection":{"context":{"ip_context":{"dst_ip_addrs":{"+0":"172.16.1.100/32"},"dst_routes":{"+0":{"prefix":"172.16.1.101/32"}},"src_ip_addrs":{"+0":"172.16.1.101/32"},"src_routes":{"+0":{"prefix":"172.16.1.100/32"}}}},"labels":{"+nodeName":"kind-worker"},"mechanism":{"cls":"LOCAL","parameters":{"+inodeURL":"file:///tmp/memifproxy/endpoint-nsc-7f9c9cddc9-hjsk5-6c1ca3fd-01f3-4d6d-a3b1-c3e5b874122d/memif.socket"},"type":"MEMIF"},"network_service_endpoint_name":"c632436f-5d67-44f1-b984-0d13f91383c8-nse-memif-9b6679887-8xbj2","path":{"path_segments":{"+1":{"name":"nsmgr-jt54m","id":"2ffda5b0-2497-4d18-99b1-825e74a08c48","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20tc3lzdGVtL3NhL2RlZmF1bHQiLCJleHAiOjE2MjQwMDgzNDMsInN1YiI6InNwaWZmZTovL2V4YW1wbGUub3JnL25zL25zbS1zeXN0ZW0vc2EvZGVmYXVsdCJ9.OIm7VMS88AEE2goqvVAJj9N0kpZEkJ9mc9SoxYPNhsXszC7_khM6QdUYs2AA46_sUbie9QQ7fokYOIuQwXNuiA","expires":{"seconds":1624008343}},"+2":{"name":"forwarder-vpp-z49mx","id":"8508bbd8-bdd9-4a06-8f00-f5fc31803316","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20tc3lzdGVtL3NhL2RlZmF1bHQiLCJleHAiOjE2MjQwMDgzNDMsInN1YiI6InNwaWZmZTovL2V4YW1wbGUub3JnL25zL25zbS1zeXN0ZW0vc2EvZGVmYXVsdCJ9.n1wOTd6G0E8XzSLA5nvmdUAFB_M3IM2dsLUVUT8MbknGBTeiMjZS-IuWYrvm04Y5HeAX_w9njfY0pB-UZFdcwQ","expires":{"seconds":1624008343},"metrics":{"client_drops":"0","client_rx_bytes":"0","client_rx_packets":"0","client_tx_bytes":"0","client_tx_packets":"0"}},"+3":{"name":"nsmgr-jt54m","id":"6a37f647-0305-4be4-b5b8-3da627134fac","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9ucy1tcjdnaC9zYS9kZWZhdWx0IiwiZXhwIjoxNjI0MDA4MzQzLCJzdWIiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20tc3lzdGVtL3NhL2RlZmF1bHQifQ.Yv1neG-lDEeLWvJKJQWVH1NOoJA3d-P9WwXZR5sLqD1nU6QZWNDda4jDzN9Jgh7xZ0DG1ioXf8htkPT7aQdZgQ","expires":{"seconds":1624007636}},"+4":{"name":"nse-memif-9b6679887-8xbj2","id":"bbef3225-f148-4013-8097-036fe58134e7","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20tc3lzdGVtL3NhL2RlZmF1bHQiLCJleHAiOjE2MjQwMDc2MzYsInN1YiI6InNwaWZmZTovL2V4YW1wbGUub3JnL25zL25zLW1yN2doL3NhL2RlZmF1bHQifQ.QAgt7lw4RyAi7qQN_p6nA0NBN8jt6lDz2GblGn58SfdqQVxg4auGSOfkc-IXaaf3kaYHFFEamZF-K4uhj7VfoA","expires":{"seconds":1624007636}},"0":{"expires":{"seconds":1624007636},"token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJzcGlmZmU6Ly9leGFtcGxlLm9yZy9ucy9uc20tc3lzdGVtL3NhL2RlZmF1bHQiLCJleHAiOjE2MjQwMDc2MzYsInN1YiI6InNwaWZmZTovL2V4YW1wbGUub3JnL25zL25zLW1yN2doL3NhL2RlZmF1bHQifQ.a2WOQb1X4EXBOBNLBF-8uWKp7kRqFC3W3XoF4mKt53LQmTyRqwEJgHq7TnoNYCMkTt6vyETTea6_3JKYMxH-Sw"}}},"payload":"ETHERNET"},"mechanism_preferences":{"-0":{"cls":"LOCAL","type":"MEMIF"}}} ```
richardstone commented 3 years ago

Hi!

Thanks for the fast response. You're right the name was much longer when it failed. Strange thing is that the kernel2kernel example works well with the longer name. Is there any chance you can make the cmd-nsc-vpp to accept a longer name just like the cmd-nsc? Or do you know where this limitation comes from in case of the vpp image?

d-uzlov commented 3 years ago

Oh, it's good to know that we correctly identified the cause of the issue!

The limitation comes from the fact that memif connection uses unix sockets, with file name containing connection id, and cmd-nsc-vpp uses the name from its config as part of the connection id. Linux has a hard limit on max length of the unix socket path, and when name of the container is very long, name of the socket may exceed the limit.

I guess we should either change the way we generate the socket name, or find some workaround.

richardstone commented 3 years ago

Yes, thanks a lot for the investigation!

Is it possible that you'll change the method of socket name generation so there won't be differences in limits for the memif and kernel NSC names?

d-uzlov commented 3 years ago

Yeah, I believe that we will fix it to remove limitations.

richardstone commented 3 years ago

Very good, Thanks in advance!

edwarnicke commented 3 years ago

@richardstone @d-uzlov Good catch.

@d-uzlov Is there any reason to set a Connection.ID at all in the cmd-nsc rather than letting the normal connection id generation kick in?

edwarnicke commented 3 years ago

@richardstone Looking more closely at this, while I wholeheartedly support your recommended workaround of not including the name in the connection id in cmd-nsc as a fix for NSM 1.0, we need to get a more comprehensive fix for post NSM 1.0 :)

I am seeing some reports that using relative paths can provide a partial workaround. If true, that should give us a reliable way to work around the issue. Thoughts?

richardstone commented 3 years ago

@edwarnicke The best for me would be that I'd be able to use the same long name for the memif nsc that I use for the kernel one.

As a workaround I started to look for a way to be able to take a substring of the full pod name as the NSM_NAME variable but I had no luck with finding a solution for that so far. I guess the thing that many NSCs will have the same name (if i leave out the parameter from my deployment so the NSM_NAME would get it's default value for every replica of the NSC) would cause issues when I start to raise the replica number, so it would be good to keep the uniqueness in the name.

d-uzlov commented 3 years ago

@d-uzlov Is there any reason to set a Connection.ID at all in the cmd-nsc rather than letting the normal connection id generation kick in?

I don't think there is any real benefit in manually setting the connection ID here. We already have path name for each path segment, which is usually used to identify where the connection came through.

But we probably don't want to limit the users in which connection ids they can use, so I think memif should support long ids.

I am seeing some reports that using relative paths can provide a partial workaround. If true, that should give us a reliable way to work around the issue. Thoughts?

I was thinking about keeping a map of [connectionId -> UUID], and using these uuids for socket names instead of connection ids.

Relative paths should work too, though I'm not sure if they will be convenient, since we would be changing current working directory of the program, and some clients may not expect this. Also, I didn't research this properly, but we could get issues with multithreading.

edwarnicke commented 3 years ago

@richardstone Net-net: are you OK for NSM 1.0 of we simply don't use the pod name in the connection id, and we can revisit for a better solution post NSM 1.0?

richardstone commented 3 years ago

@edwarnicke It's okay for me. Thanks!

edwarnicke commented 3 years ago

@richardstone Temp fix is in. Lets leave this issue open to get a longer term fix.