Closed JeffreyDevloo closed 7 years ago
Removed the similarity as the tickets are not related.
Probably some error after purging, before removal that now causes issues. We should handle this error on more places
Fixed by #245, packaged in openvstorage-backend-1.7.4-rev.770.ae31d34
alba purge-osd --long-id dGvFDhFFa2vznXT7DinSsYOnNZawjSCW --config arakoon://config/ovs/arakoon/mybackend02-abm/config?ini=%2Fopt%2FOpenvStorage%2Fconfig%2Farakoon_cacc.ini
2016-11-18 16:15:42 20300 +0100 - ovs-node-1 - 18225/139622502233856 - extensions/albacli - 0 - ERROR - Error: (Failure "unknown osd pte1QL4eIRL2V6J0GNY2D7BTGpTmI0Nn")
Traceback (most recent call last):
File "/opt/OpenvStorage/ovs/extensions/plugins/albacli.py", line 102, in run
raise RuntimeError(output['error']['message'])
RuntimeError: (Failure "unknown osd pte1QL4eIRL2V6J0GNY2D7BTGpTmI0Nn")
2016-11-18 16:15:42 20400 +0100 - ovs-node-1 - 18225/139622502233856 - extensions/albacli - 1 - DEBUG - Command: /usr/bin/alba get-disk-safety --config=arakoon://config/ovs/arakoon/mybackend02-abm/config?ini=%2Fopt%2FOpenvStorage%2Fconfig%2Farakoon_cacc.ini --to-json --long-id=pte1QL4eIRL2V6J0GNY2D7BTGpTmI0Nn
2016-11-18 16:15:42 20400 +0100 - ovs-node-1 - 18225/139622502233856 - extensions/albacli - 2 - DEBUG - stderr:
2016-11-18 16:15:42 20400 +0100 - ovs-node-1 - 18225/139622502233856 - extensions/albacli - 3 - DEBUG - stdout: {"success":false,"error":{"message":"(Failure \"unknown osd pte1QL4eIRL2V6J0GNY2D7BTGpTmI0Nn\")","exception_type":"unknown","exception_code":0}}
Should have failed with (Failure "unknown osd pte1QL4eIRL2V6J0GNY2D7BTGpTmI0Nn")
2016-11-18 16:15:42 20400 +0100 - ovs-node-1 - 18225/139622502233856 - remove/ci_backend_remover - 4 - INFO - Removing asd pte1QL4eIRL2V6J0GNY2D7BTGpTmI0Nn for disk dd734841-1943-484b-94c7-185a6966434f
{u'status': u'PENDING', u'successful': False, u'failed': False, u'result': None, u'ready': False, u'id': u'65ee6a81-759f-4a3e-910c-723f4406426f'}
{u'status': u'PENDING', u'successful': False, u'failed': False, u'result': None, u'ready': False, u'id': u'141477d2-a93f-4fc2-a40d-b494361b8280'}
{u'status': u'STARTED', u'successful': False, u'failed': False, u'result': u"{'hostname': 'celery@ovs-node-3', 'pid': 9416}", u'ready': False, u'id': u'141477d2-a93f-4fc2-a40d-b494361b8280'}
{u'status': u'STARTED', u'successful': False, u'failed': False, u'result': u"{'hostname': 'celery@ovs-node-3', 'pid': 9416}", u'ready': False, u'id': u'141477d2-a93f-4fc2-a40d-b494361b8280'}
{u'status': u'STARTED', u'successful': False, u'failed': False, u'result': u"{'hostname': 'celery@ovs-node-3', 'pid': 9416}", u'ready': False, u'id': u'141477d2-a93f-4fc2-a40d-b494361b8280'}
{u'status': u'STARTED', u'successful': False, u'failed': False, u'result': u"{'hostname': 'celery@ovs-node-3', 'pid': 9416}", u'ready': False, u'id': u'141477d2-a93f-4fc2-a40d-b494361b8280'}
{u'status': u'STARTED', u'successful': False, u'failed': False, u'result': u"{'hostname': 'celery@ovs-node-3', 'pid': 9416}", u'ready': False, u'id': u'141477d2-a93f-4fc2-a40d-b494361b8280'}
{u'status': u'STARTED', u'successful': False, u'failed': False, u'result': u"{'hostname': 'celery@ovs-node-3', 'pid': 9416}", u'ready': False, u'id': u'141477d2-a93f-4fc2-a40d-b494361b8280'}
{u'status': u'STARTED', u'successful': False, u'failed': False, u'result': u"{'hostname': 'celery@ovs-node-3', 'pid': 9416}", u'ready': False, u'id': u'141477d2-a93f-4fc2-a40d-b494361b8280'}
{u'status': u'STARTED', u'successful': False, u'failed': False, u'result': u"{'hostname': 'celery@ovs-node-3', 'pid': 9416}", u'ready': False, u'id': u'141477d2-a93f-4fc2-a40d-b494361b8280'}
{u'status': u'STARTED', u'successful': False, u'failed': False, u'result': u"{'hostname': 'celery@ovs-node-3', 'pid': 9416}", u'ready': False, u'id': u'141477d2-a93f-4fc2-a40d-b494361b8280'}
Test passed.
Problem description
When running a scenario that initializes and claims asds multiple times with an increasing number of asds, I experienced a strange bug. All the calculate_safety calls return the calculate_safety error from an older, failed asd:
A full traceback log of what happened:
Calculating the safety for any ASD is hindered due to this exception.
The disk has been that is giving the error is now 'faulted' on the GUI. The first few entries in the full traceback also state that the disk has been successfully purged from Alba and removed from the model.
Possible root of the problem
Possibly a caching of the value due to the error?
Additional information
Setup
Hyperconverged setup
Package information