scionproto / scion

SCION Internet Architecture
https://scion.org
Apache License 2.0
400 stars 160 forks source link

certificate server crashes in process_cert_chain_reply #115

Closed kormat closed 9 years ago

kormat commented 9 years ago

2015-05-19 14:07:57,899 INFO cs1-17-1: bound 127.1.17.21:30040 2015-05-19 14:07:57,901 INFO Connecting to Zookeeper 2015-05-19 14:07:57,925 [DEBUG](ZK state handler) Kazoo old state: startup, new state: CONNECTED 2015-05-19 14:07:57,925 [DEBUG](ZK state handler) Connection to Zookeeper succeeded 2015-05-19 14:07:57,926 INFO Started: 2015-05-19 14:07:57.926174 2015-05-19 14:07:58,226 [DEBUG](CS shared certs) Joined party, members are: ['127.1.17.21'] 2015-05-19 14:08:03,598 INFO Certificate chain request received. 2015-05-19 14:08:03,598 DEBUG Certificate chain not found. 2015-05-19 14:08:03,600 INFO New certificate chain request sent. 2015-05-19 14:08:03,611 INFO Certificate chain reply received 2015-05-19 14:08:03,612 CRITICAL Exception in main process: 2015-05-19 14:08:03,613 CRITICAL Traceback (most recent call last): 2015-05-19 14:08:03,613 CRITICAL File "cert_server.py", line 420, in 2015-05-19 14:08:03,614 CRITICAL main() 2015-05-19 14:08:03,614 CRITICAL File "cert_server.py", line 416, in main 2015-05-19 14:08:03,614 CRITICAL cert_server.run() 2015-05-19 14:08:03,614 CRITICAL File "cert_server.py", line 399, in run 2015-05-19 14:08:03,614 CRITICAL SCIONElement.run(self) 2015-05-19 14:08:03,614 CRITICAL File "/home/scion/scion.git/infrastructure/scion_elem.py", line 204, in run 2015-05-19 14:08:03,614 CRITICAL self.handle_request(packet, addr, sock == self._local_socket) 2015-05-19 14:08:03,614 CRITICAL File "cert_server.py", line 384, in handle_request 2015-05-19 14:08:03,614 CRITICAL self.process_cert_chain_reply(CertChainReply(packet)) 2015-05-19 14:08:03,614 CRITICAL File "cert_server.py", line 132, in process_cert_chain_reply 2015-05-19 14:08:03,614 CRITICAL "-V:" + str(tmp.certs[1].version), 2015-05-19 14:08:03,614 CRITICAL IndexError: list index out of range 2015-05-19 14:08:03,615 CRITICAL 2015-05-19 14:08:03,615 CRITICAL Exiting

@LorenzoBaesso

kormat commented 9 years ago

This happens within about 20s of startup. I'm running this in the docker environment.

pszal commented 9 years ago

For ./scion.sh it terminates as well, but with ImportError: cannot import name 'KazooTimeoutError' @LorenzoBaesso have you tested that? Please look at that.

kormat commented 9 years ago

@pszalach - see #107 for that - you need to update your version of kazoo.

pszal commented 9 years ago

Hm.. still getting that. what is correct version? My is: Name: kazoo Version: 2.0

On 19.05.2015 16:51, Stephen Shirley wrote:

@pszalach https://github.com/pszalach - see #107 https://github.com/netsec-ethz/scion/pull/107 for that - you need to update your version of kazoo.

— Reply to this email directly or view it on GitHub https://github.com/netsec-ethz/scion/issues/115#issuecomment-103533935.

pszal commented 9 years ago

Ok, got it, --user was a culprit.

On 19.05.2015 16:51, Stephen Shirley wrote:

@pszalach https://github.com/pszalach - see #107 https://github.com/netsec-ethz/scion/pull/107 for that - you need to update your version of kazoo.

— Reply to this email directly or view it on GitHub https://github.com/netsec-ethz/scion/issues/115#issuecomment-103533935.

kormat commented 9 years ago

I've gotten @LorenzoBaesso setup with the docker env, so he can reproduce the original problem now.

pszal commented 9 years ago

Just noticed the same for ./scion.sh Thanks for helping with that!

On 19.05.2015 17:00, Stephen Shirley wrote:

I've gotten @LorenzoBaesso https://github.com/LorenzoBaesso setup with the docker env, so he can reproduce the original problem now.

— Reply to this email directly or view it on GitHub https://github.com/netsec-ethz/scion/issues/115#issuecomment-103537910.

LorenzoBaesso commented 9 years ago

Just pushed a fix. It wasn't always showing up unfortunately.

kormat commented 9 years ago

(Looks like the fix was 9ef17b1af6e492cb71554143ec4f0b35f96630a0)

pszal commented 9 years ago

@LorenzoBaesso have you tested that? Now I'm getting, can someone confirm? logs/bs2-26-1.log:2015-05-20 11:18:52,463 [WARNING](BS shared pcbs) Invalid beacon. logs/bs2-26-1.log:2015-05-20 11:18:52,987 [WARNING](BS shared pcbs) The certificate is not valid. logs/bs2-26-1.log:2015-05-20 11:18:52,987 [WARNING](BS shared pcbs) The certificate chain verification failed. ... and also consider switching from WARNING to ERROR as such error this breaks route propagation.

LorenzoBaesso commented 9 years ago

I can't replicate the error. Can you give more information please?

pszal commented 9 years ago

@LorenzoBaesso what is state of this issue? I'm still getting this while running with ./scion.sh and even when ZK is restarted.

LorenzoBaesso commented 9 years ago

I can't replicate it in the docker environment...

pszal commented 9 years ago

ok, probably I need to cleanup zookeeper directory for every execution... or switch to docker ;-). What happens when CS has its local version of TRC/Cert and sees another version in ZK? Does newcomer CS sync with ZK?

On 27.05.2015 14:19, Lorenzo Baesso wrote:

I can't replicate it in the docker environment...

— Reply to this email directly or view it on GitHub https://github.com/netsec-ethz/scion/issues/115#issuecomment-105887675.

pszal commented 9 years ago

docker helps (thanks @kormat !), however I'm not sure about 2 last questions I asked.

pszal commented 9 years ago

You can replicate this bug on docker: ./scion.sh run ; sleep 20; ./scion.sh stop; ./scion.sh topology; ./scion.sh run

@LorenzoBaesso IMO the problem is connected with the questions I asked above, i.e., new certificates are inconsistent with the the stored in ZK.

kormat commented 9 years ago

Yep, can confirm that i can now reproduce the issue, following @pszalach's instructions.

pszal commented 9 years ago

141 fixes that