openvstorage / framework

The Framework is a set of components and tools which brings the user an interface (GUI / API) to setup, extend and manage an Open vStorage platform.
Other
27 stars 23 forks source link

configure_disk failed IndexError: list index out of range #762

Closed JeffreyDevloo closed 7 years ago

JeffreyDevloo commented 8 years ago

We saw this on a hyperconverged node during role configuration:

2016-08-02 13:27:08 15500 +0200 - cmp03 - 12223/140461375452992 - celery/celery.worker.job - 665 - ERROR - Task ovs.storagerouter.configure_disk[84aff99f-3533-4ce2-a663-5e80dce8fe86] raised unexpected: IndexError('list index out o
f range',)
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line 240, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line 438, in __protected_call__
    return self.run(*args, **kwargs)
  File "/opt/OpenvStorage/ovs/lib/helpers/decorators.py", line 291, in new_function
    first_element = value['values'][0]['timestamp']
IndexError: list index out of range
2016-08-02 13:27:08 53800 +0200 - cmp03 - 12223/140461375452992 - celery/celery.worker.job - 666 - ERROR - Task ovs.storagerouter.configure_disk[4f0106f9-2af5-4e61-8513-412c83eda095] raised unexpected: IndexError('list index out o
f range',)
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line 240, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line 438, in __protected_call__
    return self.run(*args, **kwargs)
  File "/opt/OpenvStorage/ovs/lib/helpers/decorators.py", line 291, in new_function
    first_element = value['values'][0]['timestamp']
IndexError: list index out of range
2016-08-02 13:27:08 54600 +0200 - cmp03 - 12223/140461375452992 - celery/celery.worker.autoscale - 667 - INFO - Scaling down 1 processes.
2016-08-02 13:27:08 72900 +0200 - cmp03 - 12223/140461375452992 - celery/celery.worker.job - 668 - ERROR - Task ovs.storagerouter.configure_disk[64a132bd-eef4-4cc9-86e8-b721c91959d8] raised unexpected: IndexError('list index out o
f range',)
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line 240, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line 438, in __protected_call__
    return self.run(*args, **kwargs)
  File "/opt/OpenvStorage/ovs/lib/helpers/decorators.py", line 291, in new_function
    first_element = value['values'][0]['timestamp']
IndexError: list index out of range
khenderick commented 8 years ago

I wonder why that list is empty in the first place, we need to make sure that we understand the usecase before we can handle this situation.

kvanhijf commented 7 years ago

Issue is NOT FIXED, by above commit I could not reproduce the issue, nor could i theoretically find a way to get in this situation where the index error is raised I've added additional logging to be able to find out where the problem is situated if ever manage to reproduce this

wimpers commented 7 years ago

@kvanhijf if we can't reproduce the issue, what should be checked by QA to verify your code changes.

kvanhijf commented 7 years ago

@wimpers : good question :) Nothing i guess, cause its quite impossible to reproduce, if possible at all

saelbrec commented 7 years ago

But code changes were done, what for? Should these be reverted ?

kvanhijf commented 7 years ago

Code changes have been done to prevent the index out of range, instead a TimeoutError will be thrown IF we ever manage to get into the path again. This means the initial job will never get launched and should be retried by customer. In order to reproduce, the assumption is that it has something to do with starting a configure_disk job, restarting the workers on some node and trigger another configure_disk job. But i didn't manage to trigger the path.

khenderick commented 7 years ago

@wimpers, @saelbrec, It's a racecondition with tasks on celery, and indeed almost impossible to reproduce. But the code showed a path that could cause the racecondition, which is now handled more correctly.

JeffreyDevloo commented 7 years ago

Information

We no longer encountered the error anymore during two and half month of installations and assigning of disk roles on our nightly builds and on our manual setups Therefore I am inclined to close this due to the nature of not being able to reproduce.

Packages

Latest packages of the time of writing this: