openshift / openshift-sdn

Apache License 2.0
69 stars 63 forks source link

OVS rule updates for ARP bugs, etc #254

Closed danwinship closed 8 years ago

danwinship commented 8 years ago

It turns out we should never have switched to using "learn" rather than hardcoding all the mappings, because learning can only work correctly if you're willing to flood packets when you don't have a mapping. We do this for ARP packets, but not for IP packets, because that could violate isolation. But this means that if our mapping times out before a container ARP cache does, then we'll stop being able to deliver their packets. (This won't happen if the two machines are in regular communication, because OVS will keep bumping up the timeout in that case. It only happens if they talk, and then a long period of time goes by with no communication [but not too long], and then they try to talk again.)

Also adds some more sanity-checking to the rules to prevent spoofed packets.

Closes #231 @openshift/networking

rajatchopra commented 8 years ago

Just to understand this correctly, if we could have an OVS learning timeout which is greater than the ARP table flush timeout of the container, we would never see this problem, right?

danwinship commented 8 years ago

Yes, though of course if the timeout is too long, it ends up being not that much different than just having the static table

dcbw commented 8 years ago

Could we scrape the ARP timeout from somewhere in /proc or /sys and use that as the hard_timeout?

danwinship commented 8 years ago

Each network namespace has its own arp cache... can they have their own timeouts too?

And what if someone does "arp -s"?

The current system is inherently fragile, and I feel like going back to the old way is better.

rajatchopra commented 8 years ago

The timeout being too long is still different than a static table because some (most?) rules will never be generated.

danwinship commented 8 years ago

another issue: if we restart openshift on a node, and it has to recreate br0, then we'll start out with no MAC addresses cached, but remote containers will still have them in their ARP caches

dcbw commented 8 years ago

@danwinship So the patch LGTM but could you attach a dump of the flow rules with at least one pod running so I can double-check the final state?

danwinship commented 8 years ago
 table=0, priority=200, arp,in_port=1,arp_spa=10.1.0.0/16,arp_tpa=10.1.1.0/24 actions=move:NXM_NX_TUN_ID[0..31]->NXM_NX_REG0[],goto_table:1
 table=0, priority=200, ip,in_port=1,nw_src=10.1.0.0/16,nw_dst=10.1.1.0/24 actions=move:NXM_NX_TUN_ID[0..31]->NXM_NX_REG0[],goto_table:1
 table=0, priority=200, arp,in_port=2,arp_spa=10.1.1.1,arp_tpa=10.1.0.0/16 actions=goto_table:5
 table=0, priority=200, ip,in_port=2,nw_src=10.1.1.1,nw_dst=10.1.0.0/16 actions=goto_table:5
 table=0, priority=200, arp,in_port=3,arp_spa=10.1.1.0/24 actions=goto_table:5
 table=0, priority=200, ip,in_port=3,nw_src=10.1.1.0/24 actions=goto_table:5
 table=0, priority=150, in_port=1 actions=drop
 table=0, priority=150, in_port=2 actions=drop
 table=0, priority=150, in_port=3 actions=drop
 table=0, priority=100, arp actions=goto_table:2
 table=0, priority=100, ip actions=goto_table:2
 table=0, priority=0 actions=drop
 table=1, cookie=0xac110011, priority=100, tun_src=172.17.0.17 actions=goto_table:5
 table=1, cookie=0xac110013, priority=100, tun_src=172.17.0.19 actions=goto_table:5
 table=1, priority=0 actions=drop
 table=2, priority=100, arp,in_port=4,arp_spa=10.1.1.2,arp_sha=02:42:0a:01:01:02 actions=load:0->NXM_NX_REG0[],goto_table:5
 table=2, priority=100, arp,in_port=5,arp_spa=10.1.1.3,arp_sha=02:42:0a:01:01:03 actions=load:0->NXM_NX_REG0[],goto_table:5
 table=2, priority=100, ip,in_port=4,nw_src=10.1.1.2 actions=load:0->NXM_NX_REG0[],goto_table:5
 table=2, priority=100, ip,in_port=5,nw_src=10.1.1.3 actions=load:0->NXM_NX_REG0[],goto_table:5
 table=2, priority=0 actions=drop
 table=3, priority=100, ip,nw_dst=172.30.0.0/16 actions=goto_table:4
 table=3, priority=0 actions=goto_table:5
 table=4, priority=200, reg0=0 actions=output:2
 table=4, priority=0 actions=drop
 table=5, priority=300, arp,arp_tpa=10.1.1.1 actions=output:2
 table=5, priority=300, ip,nw_dst=10.1.1.1 actions=output:2
 table=5, priority=200, arp,arp_tpa=10.1.1.0/24 actions=goto_table:6
 table=5, priority=200, ip,nw_dst=10.1.1.0/24 actions=goto_table:7
 table=5, priority=100, arp,arp_tpa=10.1.0.0/16 actions=goto_table:8
 table=5, priority=100, ip,nw_dst=10.1.0.0/16 actions=goto_table:8
 table=5, priority=0, ip actions=output:2
 table=5, priority=0, arp actions=drop
 table=6, priority=100, arp,arp_tpa=10.1.1.2 actions=output:4
 table=6, priority=100, arp,arp_tpa=10.1.1.3 actions=output:5
 table=6, priority=0 actions=output:3
 table=7, priority=100, ip,nw_dst=10.1.1.2 actions=output:4
 table=7, priority=100, ip,nw_dst=10.1.1.3 actions=output:5
 table=7, priority=0 actions=output:3
 table=8, cookie=0xac110011, priority=100, arp,arp_tpa=10.1.0.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:172.17.0.17->tun_dst,output:1
 table=8, cookie=0xac110011, priority=100, ip,nw_dst=10.1.0.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:172.17.0.17->tun_dst,output:1
 table=8, cookie=0xac110013, priority=100, arp,arp_tpa=10.1.2.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:172.17.0.19->tun_dst,output:1
 table=8, cookie=0xac110013, priority=100, ip,nw_dst=10.1.2.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:172.17.0.19->tun_dst,output:1
 table=8, priority=0 actions=drop
dcbw commented 8 years ago

table=2, priority=100, arp,in_port=4,arp_spa=10.1.1.2,arp_sha=02:42:0a:01:01:02 actions=load:0->NXM_NX_REG0[],goto_table:5 table=2, priority=100, arp,in_port=5,arp_spa=10.1.1.3,arp_sha=02:42:0a:01:01:03 actions=load:0->NXM_NX_REG0[],goto_table:5 table=2, priority=100, ip,in_port=4,nw_src=10.1.1.2 actions=load:0->NXM_NX_REG0[],goto_table:5 table=2, priority=100, ip,in_port=5,nw_src=10.1.1.3 actions=load:0->NXM_NX_REG0[],goto_table:5

This is single-tenant/non-isolated right? Just making sure :)

danwinship commented 8 years ago

Shouldn't the 'ip' case be goto_table:3 instead of 5?

Yup. Pushed a fix for that.

This is single-tenant/non-isolated right? Just making sure :)

Yes. Sample multitenant output below. (While doing this, I noticed that AddServiceOFRules() was adding unnecessary extra rules for netID 0 services, and fixed that as well.)

table=0, priority=200, arp,in_port=1,arp_spa=10.1.0.0/16,arp_tpa=10.1.1.0/24 actions=move:NXM_NX_TUN_ID[0..31]->NXM_NX_REG0[],goto_table:1
table=0, priority=200, ip,in_port=1,nw_src=10.1.0.0/16,nw_dst=10.1.1.0/24 actions=move:NXM_NX_TUN_ID[0..31]->NXM_NX_REG0[],goto_table:1
table=0, priority=200, arp,in_port=2,arp_spa=10.1.1.1,arp_tpa=10.1.0.0/16 actions=goto_table:5
table=0, priority=200, ip,in_port=2,nw_src=10.1.1.1,nw_dst=10.1.0.0/16 actions=goto_table:5
table=0, priority=200, arp,in_port=3,arp_spa=10.1.1.0/24 actions=goto_table:5
table=0, priority=200, ip,in_port=3,nw_src=10.1.1.0/24 actions=goto_table:5
table=0, priority=150, in_port=1 actions=drop
table=0, priority=150, in_port=2 actions=drop
table=0, priority=150, in_port=3 actions=drop
table=0, priority=100, arp actions=goto_table:2
table=0, priority=100, ip actions=goto_table:2
table=0, priority=0 actions=drop
cookie=0xac110001, table=1, priority=100, tun_src=172.17.0.1 actions=goto_table:5
cookie=0xac110003, table=1, priority=100, tun_src=172.17.0.3 actions=goto_table:5
table=1, priority=0 actions=drop
table=2, priority=100, arp,in_port=4,arp_spa=10.1.1.2,arp_sha=02:42:0a:01:01:02 actions=load:0xc->NXM_NX_REG0[],goto_table:5
table=2, priority=100, arp,in_port=5,arp_spa=10.1.1.3,arp_sha=02:42:0a:01:01:03 actions=load:0xd->NXM_NX_REG0[],goto_table:5
table=2, priority=100, ip,in_port=4,nw_src=10.1.1.2 actions=load:0xc->NXM_NX_REG0[],goto_table:3
table=2, priority=100, ip,in_port=5,nw_src=10.1.1.3 actions=load:0xd->NXM_NX_REG0[],goto_table:3
table=2, priority=0 actions=drop
table=3, priority=100, ip,nw_dst=172.30.0.0/16 actions=goto_table:4
table=3, priority=0 actions=goto_table:5
table=4, priority=200, reg0=0 actions=output:2
table=4, priority=100, tcp,reg0=0xc,nw_dst=172.30.127.103,tp_dst=5454 actions=output:2
table=4, priority=100, tcp,reg0=0xc,nw_dst=172.30.127.103,tp_dst=5455 actions=output:2
table=4, priority=0 actions=drop
table=5, priority=300, arp,arp_tpa=10.1.1.1 actions=output:2
table=5, priority=300, ip,nw_dst=10.1.1.1 actions=output:2
table=5, priority=200, arp,arp_tpa=10.1.1.0/24 actions=goto_table:6
table=5, priority=200, ip,nw_dst=10.1.1.0/24 actions=goto_table:7
table=5, priority=100, arp,arp_tpa=10.1.0.0/16 actions=goto_table:8
table=5, priority=100, ip,nw_dst=10.1.0.0/16 actions=goto_table:8
table=5, priority=0, ip actions=output:2
table=5, priority=0, arp actions=drop
table=6, priority=100, arp,arp_tpa=10.1.1.2 actions=output:4
table=6, priority=100, arp,arp_tpa=10.1.1.3 actions=output:5
table=6, priority=0 actions=output:3
table=7, priority=100, ip,reg0=0,nw_dst=10.1.1.2 actions=output:4
table=7, priority=100, ip,reg0=0xc,nw_dst=10.1.1.2 actions=output:4
table=7, priority=100, ip,reg0=0,nw_dst=10.1.1.3 actions=output:5
table=7, priority=100, ip,reg0=0xd,nw_dst=10.1.1.3 actions=output:5
table=7, priority=0 actions=output:3
cookie=0xac110001, table=8, priority=100, arp,arp_tpa=10.1.0.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:172.17.0.1->tun_dst,output:1
cookie=0xac110001, table=8, priority=100, ip,nw_dst=10.1.0.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:172.17.0.1->tun_dst,output:1
cookie=0xac110003, table=8, priority=100, arp,arp_tpa=10.1.2.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:172.17.0.3->tun_dst,output:1
cookie=0xac110003, table=8, priority=100, ip,nw_dst=10.1.2.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:172.17.0.3->tun_dst,output:1
table=8, priority=0 actions=drop
dcbw commented 8 years ago

Updates LGTM