Closed danwinship closed 8 years ago
Just to understand this correctly, if we could have an OVS learning timeout which is greater than the ARP table flush timeout of the container, we would never see this problem, right?
Yes, though of course if the timeout is too long, it ends up being not that much different than just having the static table
Could we scrape the ARP timeout from somewhere in /proc or /sys and use that as the hard_timeout?
Each network namespace has its own arp cache... can they have their own timeouts too?
And what if someone does "arp -s"?
The current system is inherently fragile, and I feel like going back to the old way is better.
The timeout being too long is still different than a static table because some (most?) rules will never be generated.
another issue: if we restart openshift on a node, and it has to recreate br0, then we'll start out with no MAC addresses cached, but remote containers will still have them in their ARP caches
@danwinship So the patch LGTM but could you attach a dump of the flow rules with at least one pod running so I can double-check the final state?
table=0, priority=200, arp,in_port=1,arp_spa=10.1.0.0/16,arp_tpa=10.1.1.0/24 actions=move:NXM_NX_TUN_ID[0..31]->NXM_NX_REG0[],goto_table:1
table=0, priority=200, ip,in_port=1,nw_src=10.1.0.0/16,nw_dst=10.1.1.0/24 actions=move:NXM_NX_TUN_ID[0..31]->NXM_NX_REG0[],goto_table:1
table=0, priority=200, arp,in_port=2,arp_spa=10.1.1.1,arp_tpa=10.1.0.0/16 actions=goto_table:5
table=0, priority=200, ip,in_port=2,nw_src=10.1.1.1,nw_dst=10.1.0.0/16 actions=goto_table:5
table=0, priority=200, arp,in_port=3,arp_spa=10.1.1.0/24 actions=goto_table:5
table=0, priority=200, ip,in_port=3,nw_src=10.1.1.0/24 actions=goto_table:5
table=0, priority=150, in_port=1 actions=drop
table=0, priority=150, in_port=2 actions=drop
table=0, priority=150, in_port=3 actions=drop
table=0, priority=100, arp actions=goto_table:2
table=0, priority=100, ip actions=goto_table:2
table=0, priority=0 actions=drop
table=1, cookie=0xac110011, priority=100, tun_src=172.17.0.17 actions=goto_table:5
table=1, cookie=0xac110013, priority=100, tun_src=172.17.0.19 actions=goto_table:5
table=1, priority=0 actions=drop
table=2, priority=100, arp,in_port=4,arp_spa=10.1.1.2,arp_sha=02:42:0a:01:01:02 actions=load:0->NXM_NX_REG0[],goto_table:5
table=2, priority=100, arp,in_port=5,arp_spa=10.1.1.3,arp_sha=02:42:0a:01:01:03 actions=load:0->NXM_NX_REG0[],goto_table:5
table=2, priority=100, ip,in_port=4,nw_src=10.1.1.2 actions=load:0->NXM_NX_REG0[],goto_table:5
table=2, priority=100, ip,in_port=5,nw_src=10.1.1.3 actions=load:0->NXM_NX_REG0[],goto_table:5
table=2, priority=0 actions=drop
table=3, priority=100, ip,nw_dst=172.30.0.0/16 actions=goto_table:4
table=3, priority=0 actions=goto_table:5
table=4, priority=200, reg0=0 actions=output:2
table=4, priority=0 actions=drop
table=5, priority=300, arp,arp_tpa=10.1.1.1 actions=output:2
table=5, priority=300, ip,nw_dst=10.1.1.1 actions=output:2
table=5, priority=200, arp,arp_tpa=10.1.1.0/24 actions=goto_table:6
table=5, priority=200, ip,nw_dst=10.1.1.0/24 actions=goto_table:7
table=5, priority=100, arp,arp_tpa=10.1.0.0/16 actions=goto_table:8
table=5, priority=100, ip,nw_dst=10.1.0.0/16 actions=goto_table:8
table=5, priority=0, ip actions=output:2
table=5, priority=0, arp actions=drop
table=6, priority=100, arp,arp_tpa=10.1.1.2 actions=output:4
table=6, priority=100, arp,arp_tpa=10.1.1.3 actions=output:5
table=6, priority=0 actions=output:3
table=7, priority=100, ip,nw_dst=10.1.1.2 actions=output:4
table=7, priority=100, ip,nw_dst=10.1.1.3 actions=output:5
table=7, priority=0 actions=output:3
table=8, cookie=0xac110011, priority=100, arp,arp_tpa=10.1.0.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:172.17.0.17->tun_dst,output:1
table=8, cookie=0xac110011, priority=100, ip,nw_dst=10.1.0.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:172.17.0.17->tun_dst,output:1
table=8, cookie=0xac110013, priority=100, arp,arp_tpa=10.1.2.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:172.17.0.19->tun_dst,output:1
table=8, cookie=0xac110013, priority=100, ip,nw_dst=10.1.2.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:172.17.0.19->tun_dst,output:1
table=8, priority=0 actions=drop
table=2, priority=100, arp,in_port=4,arp_spa=10.1.1.2,arp_sha=02:42:0a:01:01:02 actions=load:0->NXM_NX_REG0[],goto_table:5 table=2, priority=100, arp,in_port=5,arp_spa=10.1.1.3,arp_sha=02:42:0a:01:01:03 actions=load:0->NXM_NX_REG0[],goto_table:5 table=2, priority=100, ip,in_port=4,nw_src=10.1.1.2 actions=load:0->NXM_NX_REG0[],goto_table:5 table=2, priority=100, ip,in_port=5,nw_src=10.1.1.3 actions=load:0->NXM_NX_REG0[],goto_table:5
This is single-tenant/non-isolated right? Just making sure :)
Shouldn't the 'ip' case be goto_table:3 instead of 5?
Yup. Pushed a fix for that.
This is single-tenant/non-isolated right? Just making sure :)
Yes. Sample multitenant output below. (While doing this, I noticed that AddServiceOFRules() was adding unnecessary extra rules for netID 0 services, and fixed that as well.)
table=0, priority=200, arp,in_port=1,arp_spa=10.1.0.0/16,arp_tpa=10.1.1.0/24 actions=move:NXM_NX_TUN_ID[0..31]->NXM_NX_REG0[],goto_table:1
table=0, priority=200, ip,in_port=1,nw_src=10.1.0.0/16,nw_dst=10.1.1.0/24 actions=move:NXM_NX_TUN_ID[0..31]->NXM_NX_REG0[],goto_table:1
table=0, priority=200, arp,in_port=2,arp_spa=10.1.1.1,arp_tpa=10.1.0.0/16 actions=goto_table:5
table=0, priority=200, ip,in_port=2,nw_src=10.1.1.1,nw_dst=10.1.0.0/16 actions=goto_table:5
table=0, priority=200, arp,in_port=3,arp_spa=10.1.1.0/24 actions=goto_table:5
table=0, priority=200, ip,in_port=3,nw_src=10.1.1.0/24 actions=goto_table:5
table=0, priority=150, in_port=1 actions=drop
table=0, priority=150, in_port=2 actions=drop
table=0, priority=150, in_port=3 actions=drop
table=0, priority=100, arp actions=goto_table:2
table=0, priority=100, ip actions=goto_table:2
table=0, priority=0 actions=drop
cookie=0xac110001, table=1, priority=100, tun_src=172.17.0.1 actions=goto_table:5
cookie=0xac110003, table=1, priority=100, tun_src=172.17.0.3 actions=goto_table:5
table=1, priority=0 actions=drop
table=2, priority=100, arp,in_port=4,arp_spa=10.1.1.2,arp_sha=02:42:0a:01:01:02 actions=load:0xc->NXM_NX_REG0[],goto_table:5
table=2, priority=100, arp,in_port=5,arp_spa=10.1.1.3,arp_sha=02:42:0a:01:01:03 actions=load:0xd->NXM_NX_REG0[],goto_table:5
table=2, priority=100, ip,in_port=4,nw_src=10.1.1.2 actions=load:0xc->NXM_NX_REG0[],goto_table:3
table=2, priority=100, ip,in_port=5,nw_src=10.1.1.3 actions=load:0xd->NXM_NX_REG0[],goto_table:3
table=2, priority=0 actions=drop
table=3, priority=100, ip,nw_dst=172.30.0.0/16 actions=goto_table:4
table=3, priority=0 actions=goto_table:5
table=4, priority=200, reg0=0 actions=output:2
table=4, priority=100, tcp,reg0=0xc,nw_dst=172.30.127.103,tp_dst=5454 actions=output:2
table=4, priority=100, tcp,reg0=0xc,nw_dst=172.30.127.103,tp_dst=5455 actions=output:2
table=4, priority=0 actions=drop
table=5, priority=300, arp,arp_tpa=10.1.1.1 actions=output:2
table=5, priority=300, ip,nw_dst=10.1.1.1 actions=output:2
table=5, priority=200, arp,arp_tpa=10.1.1.0/24 actions=goto_table:6
table=5, priority=200, ip,nw_dst=10.1.1.0/24 actions=goto_table:7
table=5, priority=100, arp,arp_tpa=10.1.0.0/16 actions=goto_table:8
table=5, priority=100, ip,nw_dst=10.1.0.0/16 actions=goto_table:8
table=5, priority=0, ip actions=output:2
table=5, priority=0, arp actions=drop
table=6, priority=100, arp,arp_tpa=10.1.1.2 actions=output:4
table=6, priority=100, arp,arp_tpa=10.1.1.3 actions=output:5
table=6, priority=0 actions=output:3
table=7, priority=100, ip,reg0=0,nw_dst=10.1.1.2 actions=output:4
table=7, priority=100, ip,reg0=0xc,nw_dst=10.1.1.2 actions=output:4
table=7, priority=100, ip,reg0=0,nw_dst=10.1.1.3 actions=output:5
table=7, priority=100, ip,reg0=0xd,nw_dst=10.1.1.3 actions=output:5
table=7, priority=0 actions=output:3
cookie=0xac110001, table=8, priority=100, arp,arp_tpa=10.1.0.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:172.17.0.1->tun_dst,output:1
cookie=0xac110001, table=8, priority=100, ip,nw_dst=10.1.0.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:172.17.0.1->tun_dst,output:1
cookie=0xac110003, table=8, priority=100, arp,arp_tpa=10.1.2.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:172.17.0.3->tun_dst,output:1
cookie=0xac110003, table=8, priority=100, ip,nw_dst=10.1.2.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:172.17.0.3->tun_dst,output:1
table=8, priority=0 actions=drop
Updates LGTM
It turns out we should never have switched to using "learn" rather than hardcoding all the mappings, because learning can only work correctly if you're willing to flood packets when you don't have a mapping. We do this for ARP packets, but not for IP packets, because that could violate isolation. But this means that if our mapping times out before a container ARP cache does, then we'll stop being able to deliver their packets. (This won't happen if the two machines are in regular communication, because OVS will keep bumping up the timeout in that case. It only happens if they talk, and then a long period of time goes by with no communication [but not too long], and then they try to talk again.)
Also adds some more sanity-checking to the rules to prevent spoofed packets.
Closes #231 @openshift/networking