termie / nova-migration-demo

Nova is a cloud computing fabric controller (the main part of an IaaS system). It is written in Python.
http://openstack.org/projects/compute/
Apache License 2.0
2 stars 0 forks source link

Timeout from API with 50 Simultaneous Builds #124

Open termie opened 13 years ago

termie commented 13 years ago

Running cactus, I attempted a build of 50 VMs at once via the API and ran into a timeout where 12 of the 50 VMs timed out and are stuck in build. The remaining went active.

for i in seq 1 50; do nova boot test --flavor 1 --image 10 & > /dev/null ; echo $i ; done

API Limits were raised well past this in limits.py for stress testing.

2011-04-26 17:36:51,852 ERROR nova.api.openstack [-] Caught error: timed out (nova.api.openstack): TRACE: Traceback (most recent call last): (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/nova/api/openstack/init.py", line 59, in call (nova.api.openstack): TRACE: return req.get_response(self.application) (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/webob/request.py", line 919, in get_response (nova.api.openstack): TRACE: application, catch_exc_info=False) (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/webob/request.py", line 887, in call_application (nova.api.openstack): TRACE: app_iter = application(self.environ, start_response) (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/webob/dec.py", line 159, in call (nova.api.openstack): TRACE: return resp(environ, start_response) (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/webob/dec.py", line 159, in call (nova.api.openstack): TRACE: return resp(environ, start_response) (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/webob/dec.py", line 159, in call (nova.api.openstack): TRACE: return resp(environ, start_response) (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/routes/middleware.py", line 131, in call (nova.api.openstack): TRACE: response = self.app(environ, start_response) (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/webob/dec.py", line 159, in call (nova.api.openstack): TRACE: return resp(environ, start_response) (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/webob/dec.py", line 147, in call (nova.api.openstack): TRACE: resp = self.call_func(req, _args, _self.kwargs) (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/webob/dec.py", line 208, in call_func (nova.api.openstack): TRACE: return self.func(req, _args, _kwargs) (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/nova/wsgi.py", line 356, in call (nova.api.openstack): TRACE: result = method(arg_dict) (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/nova/api/openstack/servers.py", line 171, in create (nova.api.openstack): TRACE: injected_files=injected_files) (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/nova/compute/api.py", line 256, in create (nova.api.openstack): TRACE: "injected_files": injected_files}}) (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/nova/rpc.py", line 394, in cast (nova.api.openstack): TRACE: publisher = TopicPublisher(connection=conn, topic=topic) (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/nova/rpc.py", line 241, in init (nova.api.openstack): TRACE: super(TopicPublisher, self).init(connection=connection) (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/carrot/messaging.py", line 653, in init (nova.api.openstack): TRACE: self.declare() (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/carrot/messaging.py", line 664, in declare (nova.api.openstack): TRACE: auto_delete=self.auto_delete) (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/carrot/backends/pyamqplib.py", line 240, in exchange_declare (nova.api.openstack): TRACE: return self.channel.exchange_declare(exchange=exchange, (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/carrot/backends/pyamqplib.py", line 179, in channel (nova.api.openstack): TRACE: self._channel_ref = weakref.ref(self.connection.get_channel()) (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/carrot/connection.py", line 150, in get_channel (nova.api.openstack): TRACE: return self.connection.channel() (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/carrot/connection.py", line 120, in connection (nova.api.openstack): TRACE: self._connection = self._establish_connection() (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/carrot/connection.py", line 133, in _establish_connection (nova.api.openstack): TRACE: return self.create_backend().establish_connection() (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/carrot/backends/pyamqplib.py", line 195, in establish_connection (nova.api.openstack): TRACE: connect_timeout=conninfo.connect_timeout) (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/amqplib/client_0_8/connection.py", line 125, in init (nova.api.openstack): TRACE: self.transport = create_transport(host, connect_timeout, ssl) (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/amqplib/client_0_8/transport.py", line 220, in create_transport (nova.api.openstack): TRACE: return TCPTransport(host, connect_timeout) (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/amqplib/client_0_8/transport.py", line 58, in init (nova.api.openstack): TRACE: self.sock.connect((host, port)) (nova.api.openstack): TRACE: File "/usr/lib/python2.6/dist-packages/eventlet/greenio.py", line 178, in connect (nova.api.openstack): TRACE: timeout_exc=socket.timeout("timed out")) (nova.api.openstack): TRACE: File "/usr/lib/python2.6/dist-packages/eventlet/hubs/init**.py", line 121, in trampoline (nova.api.openstack): TRACE: return hub.switch() (nova.api.openstack): TRACE: File "/usr/lib/python2.6/dist-packages/eventlet/hubs/hub.py", line 177, in switch (nova.api.openstack): TRACE: return self.greenlet.switch() (nova.api.openstack): TRACE: timeout: timed out (nova.api.openstack): TRACE:


Imported from Launchpad using lp2gh.

termie commented 13 years ago

(by soren) I've managed to reproduce this locally by stubbing out quotas, ratelimiting, and the virt driver.

termie commented 13 years ago

(by soren) My current working hypothesis is that something in Nova is blocking. This causes the connect_timeout to trigger (because once eventlet context switches back into the main greenthread (or whatever it is that it does)), the timeout has expired without being cancelled. This would explain the log message in rabbit saying that the connection was abruptly terminated (meaning the connection was actually established, in spite of Nova saying it timed out).

If that's indeed the case, yes, we shouldn't be blocking, but eventlet also should check if the connection has been established before it kills the connection attempt.

termie commented 13 years ago

(by antonym) Full logs from api and scheduler from 50 instance build:

http://paste.openstack.org/show/1271/ - nova-api http://paste.openstack.org/show/1272/ - nova-scheduler

termie commented 13 years ago

(by soren) I'm still blaming eventlet for this. Looking at the logs, it looks like the sheer volume of requests simply makes it take more than 5 seconds before the event loop in eventlet gets back to the attempt to establish a connection. We can optimise the time-per-request, but that'll won't solve anything, it'll just hide the problem until we try 100 or 200 requests at a time. Same if we increase the timeout.

termie commented 13 years ago

(by vishvananda) Wasn't there some idea that this was because nova-api doesn't monkeypatch all of the libraries?

Vish

On May 4, 2011, at 1:58 PM, Soren Hansen wrote:

I'm still blaming eventlet for this. Looking at the logs, it looks like the sheer volume of requests simply makes it take more than 5 seconds before the event loop in eventlet gets back to the attempt to establish a connection. We can optimise the time-per-request, but that'll won't solve anything, it'll just hide the problem until we try 100 or 200 requests at a time. Same if we increase the timeout.

You received this bug notification because you are a member of Nova Bug Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/771512

Title: Timeout from API with 50 Simultaneous Builds

Status in OpenStack Compute (Nova): Confirmed

Bug description: Running cactus, I attempted a build of 50 VMs at once via the API and ran into a timeout where 12 of the 50 VMs timed out and are stuck in build. The remaining went active.

for i in seq 1 50; do nova boot test --flavor 1 --image 10 & > /dev/null ; echo $i ; done

API Limits were raised well past this in limits.py for stress testing.

2011-04-26 17:36:51,852 ERROR nova.api.openstack [-] Caught error: timed out (nova.api.openstack): TRACE: Traceback (most recent call last): (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/nova/api/openstack/init.py", line 59, in call (nova.api.openstack): TRACE: return req.get_response(self.application) (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/webob/request.py", line 919, in get_response (nova.api.openstack): TRACE: application, catch_exc_info=False) (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/webob/request.py", line 887, in call_application (nova.api.openstack): TRACE: app_iter = application(self.environ, start_response) (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/webob/dec.py", line 159, in call (nova.api.openstack): TRACE: return resp(environ, start_response) (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/webob/dec.py", line 159, in call (nova.api.openstack): TRACE: return resp(environ, start_response) (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/webob/dec.py", line 159, in call (nova.api.openstack): TRACE: return resp(environ, start_response) (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/routes/middleware.py", line 131, in call (nova.api.openstack): TRACE: response = self.app(environ, start_response) (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/webob/dec.py", line 159, in call (nova.api.openstack): TRACE: return resp(environ, start_response) (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/webob/dec.py", line 147, in call (nova.api.openstack): TRACE: resp = self.call_func(req, _args, _self.kwargs) (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/webob/dec.py", line 208, in call_func (nova.api.openstack): TRACE: return self.func(req, _args, _kwargs) (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/nova/wsgi.py", line 356, in call (nova.api.openstack): TRACE: result = method(arg_dict) (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/nova/api/openstack/servers.py", line 171, in create (nova.api.openstack): TRACE: injected_files=injected_files) (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/nova/compute/api.py", line 256, in create (nova.api.openstack): TRACE: "injected_files": injected_files}}) (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/nova/rpc.py", line 394, in cast (nova.api.openstack): TRACE: publisher = TopicPublisher(connection=conn, topic=topic) (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/nova/rpc.py", line 241, in init (nova.api.openstack): TRACE: super(TopicPublisher, self).init(connection=connection) (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/carrot/messaging.py", line 653, in init (nova.api.openstack): TRACE: self.declare() (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/carrot/messaging.py", line 664, in declare (nova.api.openstack): TRACE: auto_delete=self.auto_delete) (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/carrot/backends/pyamqplib.py", line 240, in exchange_declare (nova.api.openstack): TRACE: return self.channel.exchange_declare(exchange=exchange, (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/carrot/backends/pyamqplib.py", line 179, in channel (nova.api.openstack): TRACE: self._channel_ref = weakref.ref(self.connection.get_channel()) (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/carrot/connection.py", line 150, in get_channel (nova.api.openstack): TRACE: return self.connection.channel() (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/carrot/connection.py", line 120, in connection (nova.api.openstack): TRACE: self._connection = self._establish_connection() (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/carrot/connection.py", line 133, in _establish_connection (nova.api.openstack): TRACE: return self.create_backend().establish_connection() (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/carrot/backends/pyamqplib.py", line 195, in establish_connection (nova.api.openstack): TRACE: connect_timeout=conninfo.connect_timeout) (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/amqplib/client_0_8/connection.py", line 125, in init (nova.api.openstack): TRACE: self.transport = create_transport(host, connect_timeout, ssl) (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/amqplib/client_0_8/transport.py", line 220, in create_transport (nova.api.openstack): TRACE: return TCPTransport(host, connect_timeout) (nova.api.openstack): TRACE: File "/usr/lib/pymodules/python2.6/amqplib/client_0_8/transport.py", line 58, in init (nova.api.openstack): TRACE: self.sock.connect((host, port)) (nova.api.openstack): TRACE: File "/usr/lib/python2.6/dist-packages/eventlet/greenio.py", line 178, in connect (nova.api.openstack): TRACE: timeout_exc=socket.timeout("timed out")) (nova.api.openstack): TRACE: File "/usr/lib/python2.6/dist-packages/eventlet/hubs/init**.py", line 121, in trampoline (nova.api.openstack): TRACE: return hub.switch() (nova.api.openstack): TRACE: File "/usr/lib/python2.6/dist-packages/eventlet/hubs/hub.py", line 177, in switch (nova.api.openstack): TRACE: return self.greenlet.switch() (nova.api.openstack): TRACE: timeout: timed out (nova.api.openstack): TRACE:

termie commented 13 years ago

(by soren) 2011/5/4 Vish Ishaya vishvananda@gmail.com:

Wasn't there some idea that this was because nova-api doesn't monkeypatch all of the libraries?

There was. It wasn't.