ligato / vpp-agent

⚡️ Control plane management agent for FD.io's VPP
https://docs.ligato.io/
Apache License 2.0
252 stars 126 forks source link

Issue creating host-interfaces in vpp-agent #861

Closed sbyx closed 6 years ago

sbyx commented 6 years ago

I'm trying to play around with vpp-agent in a simple docker topology (see bottom). VPP and the agent come up fine and each container has 3 interfaces eth0 .. eth3. If I use vppctl i can add the interfaces to VPP using "create host-interface name eth0" etc. but trying the same with vpp-agent does not seem to work. Any help or pointers here would be greatly appreciated even if just to debug this further.

vpp-agent-ctl /opt/vpp-agent/dev/etcd.conf -put /vnf-agent/node1/vpp/config/v1/interface/afpacket1 foo.json

I took the JSON from the example:

{
    "afpacket": {
        "host_if_name": "lo"
    },
    "enabled": true,
    "mtu": 1500,
    "ip_addresses": [
        "fdcd:f7fb:995c::/48"
    ],
    "name": "afpacket1",
    "phys_address": "b4:e6:1c:a1:0d:31",
    "type": 4
}

In fact actually it does work for the "lo" interface in host_if_name but not for any of the eth0 .. eth3 veths of the container.

I don't see any errors in the log (following shows afpacket1 with lo working and afpacket2 for eth0 not working but also not giving any errors).

time="2018-09-05 13:25:46.40518" level=info msg="Configuring new interface afpacket1" loc="ifplugin/interface_config.go(215)" logger=vpp-plugin-if-conf

2018-09-05 13:25:46,605 DEBG 'agent' stdout output:
time="2018-09-05 13:25:46.60557" level=info msg="Interface configuration done" ifIdx=1 ifName=afpacket1 loc="ifplugin/interface_config.go(344)" logger=vpp-plugin-if-conf
time="2018-09-05 13:25:46.60571" level=info msg="Assigning new interface afpacket1 to bridge domain" loc="l2plugin/bd_config.go(322)" logger=vpp-plugin-l2-bd-conf

2018-09-05 13:25:46,606 DEBG 'agent' stdout output:
time="2018-09-05 13:25:46.60577" level=info msg="FIB configurator: resolving registered interface afpacket1" loc="l2plugin/fib_config.go(243)" logger=vpp-plugin-l2-fib-conf
time="2018-09-05 13:25:46.60582" level=info msg="FIB: resolution of created interface afpacket1 is done" loc="l2plugin/fib_config.go(249)" logger=vpp-plugin-l2-fib-conf
time="2018-09-05 13:25:46.60592" level=info msg="Linux IF configurator: resolve created vpp interface name:\"afpacket1\" type:AF_PACKET_INTERFACE enabled:true phys_address:\"b4:e6:1c:a1:0d:31\" mtu:1500 ip_addresses:\"fdcd:f7fb:995c::/48\" afpacket:<host_if_name:\"lo\" > " loc="ifplugin/interface_config.go(978)" logger=linux-plugin-if-conf

2018-09-05 13:27:12,642 DEBG 'agent' stdout output:
time="2018-09-05 13:27:12.64265" level=info msg="Modifying Interface afpacket2" loc="ifplugin/interface_config.go(509)" logger=vpp-plugin-if-conf

2018-09-05 13:27:12,643 DEBG 'agent' stdout output:
time="2018-09-05 13:27:12.64274" level=info msg="Configuring new interface afpacket2" loc="ifplugin/interface_config.go(215)" logger=vpp-plugin-if-conf

In addition I'm not certain if the "name"-field actually does anything, it seems the host interface (at least for lo) is just named host-lo. Same goes for the key of the interface in etcd, not sure what is the significance here.

Finally for reference the docker-compose.yaml I'm using to create the topology:

version: '3'
services:
  etcd:
    image: quay.io/coreos/etcd
    command: ["/usr/local/bin/etcd", "-advertise-client-urls", "http://0.0.0.0:2379", "-listen-client-urls", "http://0.0.0.0:2379"]
    ports: ["2379:2379"]

  kafka:
    image: spotify/kafka
    ports: ["9092:9092"]
    environment:
      ADVERTISED_HOST: 172.17.0.1
      ADVERTISED_PORT: 9092

  node1:
    image: ligato/vpp-agent:v1.6
    depends_on: ["kafka", "etcd"]
    privileged: true
    environment:
      MICROSERVICE_LABEL: node1
    networks:
      - default
      - node1
      - node1_node2
      - node1_node3

  node2:
    image: ligato/vpp-agent:v1.6
    depends_on: ["kafka", "etcd"]
    privileged: true
    environment:
      MICROSERVICE_LABEL: node2
    networks:
      - default
      - node2
      - node1_node2
      - node2_node3

  node3:
    image: ligato/vpp-agent:v1.6
    depends_on: ["kafka", "etcd"]
    privileged: true
    environment:
      MICROSERVICE_LABEL: node3
    networks:
      - default
      - node3
      - node1_node3
      - node2_node3

networks:
  default:

  node1:
  node1_node2:
  node1_node3:

  node2:
  node2_node3:

  node3:

Thanks in advance

VladoLavor commented 6 years ago

@sbyx please make sure that the eth0 and other interfaces you are trying to use with the af-packet are on the VPP host VM/docker. Af-packet interface cannot be connected directly to the interface in the different namespace (container in this case) simply because VPP does not allow it. VPP agent also didn't throw the error because it didn't know about the eth0 interface (it means the interface was not registered within vpp-agent). The configuration for the eth0 was cached until the interface eth0 appears. You can see that with 'debug' log option turned on.

The vpp-agent automatically registers all the linux interfaces in the host. That's why it worked with 'lo'.

The "name" field is mainly for the vpp-agent internal use, or if you want to refer to the interface from a different configuration type (f.e. if you want to put an interface to bridge domain). VPP interfaces are named by the VPP (the vpp internal name) which the vpp-agent cannot affect.

sbyx commented 6 years ago

I'm a bit confused since I am assuming VPP and ligato are running inside the container (namespace) of node1..3 respectively and would register and take interfaces from there own namespace instead of from the (docker) host.

Especially since I get what I want if I go on vppctl inside any of the nodes (docker-compose exec node1 vppctl) and do "create host-interface name eth0" which makes VPP register an AF_PACKET interface using the eth0 assigned to the container itself (as I would expect).

I was actually building the same setup with a container including VPP and honeycomb and there it works fine too. Do you have any suggestions on whether or how I could achieve the same using ligato?

ondrej-fabry commented 6 years ago

@sbyx could you provide dump from Etcd for this agent instance? There is special ../vpp/status/.. section with actual status where we could see if the eth0 interface was detected.

sbyx commented 6 years ago

@ondrej-fabry

/vnf-agent/node1/vpp/config/v1/interface/host-eth0
{
    "afpacket": {
        "host_if_name": "eth0"
    },
    "enabled": true,
    "mtu": 1500,
    "ip_addresses": [
        "fd12::10/64"
    ],
    "name": "host-eth0",
    "phys_address": "00:00:00:00:12:10",
    "type": 4
}

/vnf-agent/node1/vpp/status/v1/interface/local0
{"name":"local0","internal_name":"local0","admin_status":2,"oper_status":2,"statistics":{}}

Again launching vppctl and doing "create host-interface name eth0" afterwards works fine.

VladoLavor commented 6 years ago

@sbyx I tried the same config and it created the af-packet interface just fine.

please make sure that MICROSERVICE_LABEL environment variable is already set to node1 (default is vpp1) since it has to match with the value in the interface key.

If it still won't work, please set the environment variable INITIAL_LOGLVL=debug to enable debug logs - we will have a look and try to figure it out.

sbyx commented 6 years ago

Thanks for bearing with me. Yes, the label is set beforehand, also the status-plugin writes the interface status correctly to /vnf-agent/node1. With INITIAL_LOGLVL=debug the output is as follows:

node1_1  | 2018-09-18 07:46:22,851 DEBG 'agent' stdout output:
node1_1  | time="2018-09-18 07:46:22.85175" level=debug msg="Start processing change for key: vpp/config/v1/interface/eth0" loc="vpp/data_change.go(41)" logger=vpp
node1_1  |
node1_1  | 2018-09-18 07:46:22,852 DEBG 'agent' stdout output:
node1_1  | time="2018-09-18 07:46:22.85193" level=debug msg="dataChangeIface false Put name:\"host-eth0\" type:AF_PACKET_INTERFACE enabled:true phys_address:\"00:00:00:00:12:10\" mtu:1500 ip_addresses:\"fd12::10/64\" afpacket:<host_if_name:\"eth0\" >  " loc="vpp/data_change.go(396)" logger=vpp
node1_1  | time="2018-09-18 07:46:22.85201" level=info msg="Configuring new interface host-eth0" loc="ifplugin/interface_config.go(210)" logger=vpp-if-conf
node1_1  | time="2018-09-18 07:46:22.85206" level=debug msg="Afpacket interface with name host-eth0 added to cache (hostIf: eth0, pending: true)" loc="ifplugin/afpacket_config.go(218)" logger=vpp-if-conf
node1_1  |
node1_1  | 2018-09-18 07:46:22,853 DEBG 'agent' stdout output:
node1_1  | time="2018-09-18 07:46:22.85303" level=debug msg="interface name:\"host-eth0\" type:AF_PACKET_INTERFACE enabled:true phys_address:\"00:00:00:00:12:10\" mtu:1500 ip_addresses:\"fd12::10/64\" afpacket:<host_if_name:\"eth0\" >  cannot be created yet and will be configured later" loc="ifplugin/interface_config.go(245)" logger=vpp-if-conf
node1_1  |
node1_1  | 2018-09-18 07:46:25,758 DEBG 'agent' stdout output:
node1_1  | time="2018-09-18 07:46:25.75803" level=debug msg="message sent: ProducerMessage - Topic: if_state, Key: vpp/status/v1/interface/local0, Value: {\"name\":\"local0\",\"internal_name\":\"local0\",\"admin_status\":2,\"oper_status\":2,\"statistics\":{}}, Meta: unexpected type <nil>, Offset: 74, Partition: 0\n" loc="client/syncproducer.go(180)" logger=kafka
node1_1  |
node1_1  | 2018-09-18 07:46:35,750 DEBG 'agent' stdout output:
node1_1  | time="2018-09-18 07:46:35.75025" level=debug msg="message sent: ProducerMessage - Topic: if_state, Key: vpp/status/v1/interface/local0, Value: {\"name\":\"local0\",\"internal_name\":\"local0\",\"admin_status\":2,\"oper_status\":2,\"statistics\":{}}, Meta: unexpected type <nil>, Offset: 75, Partition: 0\n" loc="client/syncproducer.go(180)" logger=kafka
node1_1  |

and for reference:

$ etcdctl get "" --prefix
/vnf-agent/node1/vpp/config/v1/interface/eth0
{
    "afpacket": {
        "host_if_name": "eth0"
    },
    "enabled": true,
    "mtu": 1500,
    "ip_addresses": [
        "fd12::10/64"
    ],
    "name": "host-eth0",
    "phys_address": "00:00:00:00:12:10",
    "type": 4
}

/vnf-agent/node1/vpp/status/v1/interface/local0
{"name":"local0","internal_name":"local0","admin_status":2,"oper_status":2,"statistics":{}}

on another note I tried creating a TAP interface too using etcd

{
    "tap": {
        "host_if_name": "vpp"
    },
    "enabled": true,
    "mtu": 1500,
    "ip_addresses": [
        "fd10::10/64"
    ],
    "name": "tapcli-0",
    "phys_address": "00:00:00:00:10:10",
    "type": 3
}

This actually works fine, so might be something specific to af-packet.

VladoLavor commented 6 years ago

@sbyx so we have found what's the issue and it was actually on our side. I tried the setup locally, not in docker container - that's the reason it worked for me.

eth0 interface in the docker container is no longer and eth type, but a veth - and I didn't realize that the veth-type interfaces are not registered automatically as other types. The reason was that it is not always possible to easily find the other end.

I talked briefly with @ondrej-fabry and we suppose that it should not cause any harm allowing it, so we removed this restriction in our dev branch (pantheon-dev). If I may kindly ask you to try your setup with latest pantheon-dev branch (we expect it to be currently stable) and let us know whether it worked.

You can optionally use our docker image: ligato/vpp-agent:pantheon-dev

sbyx commented 6 years ago

Confirmed fixed (tested using docker image above). Thanks a ton guys!