scionproto / scion

SCION Internet Architecture
https://www.scion-architecture.net/
Apache License 2.0
377 stars 156 forks source link

Redundant paths with different segment IDs #93

Closed aznair closed 9 years ago

aznair commented 9 years ago

When requesting paths, sciond is receiving multiple PCBs that are topologically equivalent but have different segment IDs. The daemon sees these as different paths even though they use the same hop sequences, so the host thinks it is using e.g. 4 paths when in fact they are all the same path.

Example PCBs:

---- pcb ---- [PathSegment] Segment ID: b'X\xfe\xf3$\x8a3i\xa8\xe6*\t\xf2\xf4)\x87.\xc5\x94\xe3\xab\xa5\x0bC\x9c\xafoq\xd4\xee\x87\xa9\xc9' [Info OF info: 40, up: False, TS: 1430297228, ISD ID: 1, hops: 3] [TRC OF info: ff, TRCv: 0, IF ID: 290]

[Autonomous Domain] [PCB Marking ad_id: 13] ig_rev_token: b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' eg_rev_token:b'\x1dpJ\x8f\xb6K.)\xe8\x07a(;"\xf3\x08P\x91y\xe7\xc7$\x81\xc1+JQf\xfc\xf0A^' [Support Signature OF cert_chain_version: 0, sig_len: 0, block_size: 96] [Hop OF info: 0, exp_time: 63, ingress if: 0, egress if: 284, mac: 0] [Info OF TD ID: 1, bwalloc_f: 0, bwalloc_r: 0] [Signature: ] [Autonomous Domain] [PCB Marking ad_id: 16] ig_rev_token: b'\nj\x13\x13(\xac:\x08\xc6\xe7\xcf`\x0f\xb7\x92\xcd9\xa3\xd0\t\x07\xfc\x1d\xe9#m\xac\xfeN\xad\xd2x' eg_rev_token:b'\x93\xb7<=$\xc1=\xd7RC;6f\x1f\xc7P#\xcd\x89\xfe\xa1s\n\xde2E\xc6P,A\xfb8' [Support Signature OF cert_chain_version: 0, sig_len: 0, block_size: 184] [Hop OF info: 0, exp_time: 63, ingress if: 284, egress if: 290, mac: 0] [Info OF TD ID: 1, bwalloc_f: 0, bwalloc_r: 0] [Peer Marking ad_id: 15] ig_rev_token: b'\x1b\xf73w\xa1\xa6xC\x0e\xe3\xc3\xe4\xd7\xf8;v\x1f\xb2\xaa\xa7\xb0t\xe9\xb0\xf1\x8e\r\x10\xb1?l\xf5'\eg_rev_token:b'\x93\xb7<=$\xc1=\xd7RC;6f\x1f\xc7P#\xcd\x89\xfe\xa1s\n\xde2E\xc6P,A\xfb8' [Hop OF info: 0, exp_time: 63, ingress if: 286, egress if: 290, mac: 0] [Support Peer OF TD ID: 1, bwalloc_f: 0, bwalloc_r: 0, bw_class: 0] [Signature: ] [Autonomous Domain] [PCB Marking ad_id: 19] ig_rev_token: b'\t\xb5\xf0\xab\xa2w\n;K\xbeq=u{E\x9a\xbb\xb6%c\xbc\x9fV\x8b]\xe6\xcdB7\x15\xad\xeb' eg_rev_token:b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' [Support Signature OF cert_chain_version: 0, sig_len: 0, block_size: 96] [Hop OF info: 0, exp_time: 63, ingress if: 290, egress if: 0, mac: 0] [Info OF TD ID: 1, bwalloc_f: 0, bwalloc_r: 0] [Signature: ]

---- pcb ---- [PathSegment] Segment ID: b'2C\xbfJ\x102\x07\xd7\xaf"\xbb\xc0\xa3\xb9I\x8c[P\xd8\xca\xe3\n\x92]\xf1\xd8\x93\xe7\x13\x9e!$' [Info OF info: 40, up: False, TS: 1430297278, ISD ID: 1, hops: 3] [TRC OF info: ff, TRCv: 0, IF ID: 290]

[Autonomous Domain] [PCB Marking ad_id: 13] ig_rev_token: b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' eg_rev_token:b"\x0fO\xe57l9\x14\xc2\x18\x9f\xc8\xfc\xbb\x17\tC\xdd\x02\x92\xdb\xe3\t[R\xc7\xf4\xae\x82\xbag'v" [Support Signature OF cert_chain_version: 0, sig_len: 0, block_size: 96] [Hop OF info: 0, exp_time: 63, ingress if: 0, egress if: 284, mac: 0] [Info OF TD ID: 1, bwalloc_f: 0, bwalloc_r: 0] [Signature: ] [Autonomous Domain] [PCB Marking ad_id: 16] ig_rev_token: b'=:\xc18;\x80\xd1G;\xbd\xd3\xe0\x01\xcbbYi"fx!\xdc*\x05z\x9eUQ\x94J\x7f\xd2' eg_rev_token:b'\xa5\xf6F(\xb6\xc1:r;\x9d\x1b\xb0-\xd7\xea>^\xa0#\x96\xff\xb0\xa6\xeds\xbb\xfa+\xd7\x0b\x10\x89' [Support Signature OF cert_chain_version: 0, sig_len: 0, block_size: 184] [Hop OF info: 0, exp_time: 63, ingress if: 284, egress if: 290, mac: 0] [Info OF TD ID: 1, bwalloc_f: 0, bwalloc_r: 0] [Peer Marking ad_id: 15] ig_rev_token: b"\x16\xbb\x1c\x03\x8b\xfb\x8c\xd8zL\xba\x08\xa3\x133\xfb\x06'\x94\xd6\xb4F\xb1\x82\x89~\xa0\xb1y\x13\x08C"\eg_rev_token:b'\xa5\xf6F(\xb6\xc1:r;\x9d\x1b\xb0-\xd7\xea>^\xa0#\x96\xff\xb0\xa6\xeds\xbb\xfa+\xd7\x0b\x10\x89' [Hop OF info: 0, exp_time: 63, ingress if: 286, egress if: 290, mac: 0] [Support Peer OF TD ID: 1, bwalloc_f: 0, bwalloc_r: 0, bw_class: 0] [Signature: ] [Autonomous Domain] [PCB Marking ad_id: 19] ig_rev_token: b'\t\xb5\xf0\xab\xa2w\n;K\xbeq=u{E\x9a\xbb\xb6%c\xbc\x9fV\x8b]\xe6\xcdB7\x15\xad\xeb' eg_rev_token:b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' [Support Signature OF cert_chain_version: 0, sig_len: 0, block_size: 96] [Hop OF info: 0, exp_time: 63, ingress if: 290, egress if: 0, mac: 0] [Info OF TD ID: 1, bwalloc_f: 0, bwalloc_r: 0] [Signature: ]

aznair commented 9 years ago

I'm not sure if this is related, but I did notice something a little troubling. The code that generates interface IDs looks like this: if_id = str(255 + int(ad_id) + int(nbr_ad_id)) This means the interface between ADs 11-14 and the interface between ADs 12-13 have the same ID (of course there are several other cases like this one as well). I'm not sure if this actually causes a problem somewhere in the code but conceptually it seems wrong that two completely unrelated interfaces have the same identifier.

aznair commented 9 years ago

@pszalach

So I think I've got the problem narrowed down to two parts.

  1. Race condition in _get_if_rev_token() This function is called in both the "BS PCB propagation" thread and "BS register segments" thread. Sometimes if we're unlucky both threads create a new revocation token, leading to two topologically equivalent paths.
  2. PCBs carried over from previous iteration When SCION is run multiple times, it keeps a PCB from the previous iteration. So when beacon servers start up they register this PCB in the path server, then receive a newly created PCB from the core and register this as well, again creating two topologically equivalent paths. While clearing ZooKeeper every time is probably not a good approach, I think we need some way to prevent duplicate paths resulting from this carry-over behavior.
pszal commented 9 years ago

@aznair oh, right, probably zookeeper is a main culprit. Probably we should also keep (seeds of) revocation tokens in ZK. I'll think about that. Yes, 1. looks as another potential problem. Hopefully that's all, but maybe @shitz noticed such behavior previously?

shitz commented 9 years ago

I have not, but then I haven't looked to closely to be honest. 1) does indeed sound like a problem, but it should be easily fixable with a lock. 2) sounds more serious and I can't say much about this, since I haven't looked too closely at the ZK code so far.

On Tue, May 5, 2015 at 3:46 PM, pszalach notifications@github.com wrote:

@aznair https://github.com/aznair oh, right, probably zookeeper is a main culprit. Probably we should also keep (seeds of) revocation tokens in ZK. I'll think about that. Yes, 1. looks as another potential problem. Hopefully that's all, but maybe @shitz https://github.com/shitz noticed such behavior previously?

— Reply to this email directly or view it on GitHub https://github.com/netsec-ethz/scion/issues/93#issuecomment-99082564.

pszal commented 9 years ago

I think that solution for 2. may be generating (seeds for) revocation tokens in a deterministic way. For example using AD's secret key and paths'id (get_hops_hash()).

shitz commented 9 years ago

You have to be careful though, since get_hops_hash() uses interface revocation tokens and you might end up with a circular dependency. I could see something like this working:

Seed for interface rev-token chain: secret-key + interface-ID Seed for segment rev-token chain: secret-key + get_hops_hash()

However, you will run into problems when a hash chain gets exhausted, so there needs to be some sort of versioning for the hash-chain as well.

On Wed, May 6, 2015 at 10:39 AM, pszalach notifications@github.com wrote:

I think that solution for 2. may be generating (seeds for) revocation tokens in a deterministic way. For example using AD's secret key and paths'id (get_hops_hash()).

— Reply to this email directly or view it on GitHub https://github.com/netsec-ethz/scion/issues/93#issuecomment-99384036.

pszal commented 9 years ago

Yes, I thought about something like that. Probably, we will rotate the secret keys (~daily), hence it would help with exhausted hash chain.

On 06.05.2015 11:32, shitz wrote:

You have to be careful though, since get_hops_hash() uses interface revocation tokens and you might end up with a circular dependency. I could see something like this working:

Seed for interface rev-token chain: secret-key + interface-ID Seed for segment rev-token chain: secret-key + get_hops_hash()

However, you will run into problems when a hash chain gets exhausted, so there needs to be some sort of versioning for the hash-chain as well.

On Wed, May 6, 2015 at 10:39 AM, pszalach notifications@github.com wrote:

I think that solution for 2. may be generating (seeds for) revocation tokens in a deterministic way. For example using AD's secret key and paths'id (get_hops_hash()).

— Reply to this email directly or view it on GitHub https://github.com/netsec-ethz/scion/issues/93#issuecomment-99384036.

— Reply to this email directly or view it on GitHub https://github.com/netsec-ethz/scion/issues/93#issuecomment-99397441.

pszal commented 9 years ago

please reopen if issue still exists