openvstorage / framework

The Framework is a set of components and tools which brings the user an interface (GUI / API) to setup, extend and manage an Open vStorage platform.
Other
27 stars 23 forks source link

claim asd failed but gui alert is claimed #1667

Closed jeroenmaelbrancke closed 7 years ago

jeroenmaelbrancke commented 7 years ago

Unable to claim an asd but gui shows claimed. After refreshing the page the asds is still available to claim.

claimed

Error in the logging:

Jun 12 07:20:53 NY1SRV0019 celery[13127]: 2017-06-12 07:20:53 13200 -0400 - NY1SRV0019 - 14977/139952705304320 - celery/celery.redirected - 182759 - WARNING - 2017-06-12 07:20:53 13200 -0400 - NY1SRV0019 - 14977/139952705304320 - extensions/alba-cli - 182758 - DEBUG - stdout: {"success":false,"error":{"message":"Albamgr exception(Albamgr_protocol.Protocol.Error.Unknown,(Failure \"Albamgr: too many retries for operation MarkOsdClaimed\"))","exception_type":"albamgr_exn","exception_code":1}}
Jun 12 07:20:53 NY1SRV0019 celery[13127]: 2017-06-12 07:20:53 25100 -0400 - NY1SRV0019 - 14977/139952705304320 - celery/celery.redirected - 182761 - WARNING - 2017-06-12 07:20:53 25000 -0400 - NY1SRV0019 - 14977/139952705304320 - extensions/alba-cli - 182760 - ERROR - Error: Albamgr exception(Albamgr_protocol.Protocol.Error.Unknown,(Failure "Albamgr: too many retries for operation MarkOsdClaimed"))
Jun 12 07:20:53 NY1SRV0019 celery[13127]: Traceback (most recent call last):
Jun 12 07:20:53 NY1SRV0019 celery[13127]:   File "/opt/OpenvStorage/ovs/extensions/plugins/albacli.py", line 139, in run
Jun 12 07:20:53 NY1SRV0019 celery[13127]:     raise AlbaError(output['error']['message'], output['error']['exception_code'])
Jun 12 07:20:53 NY1SRV0019 celery[13127]: AlbaError: Albamgr exception(Albamgr_protocol.Protocol.Error.Unknown,(Failure "Albamgr: too many retries for operation MarkOsdClaimed"))
Jun 12 07:20:53 NY1SRV0019 celery[13127]: 2017-06-12 07:20:53 25200 -0400 - NY1SRV0019 - 14977/139952705304320 - celery/celery.redirected - 182763 - WARNING - 2017-06-12 07:20:53 25200 -0400 - NY1SRV0019 - 14977/139952705304320 - extensions/alba-cli - 182762 - DEBUG - Command: /usr/bin/alba claim-osd --to-json --config=arakoon://config/ovs/arakoon/ny1-ssdbackend01-abm/config?ini=%2Fopt%2FOpenvStorage%2Fconfig%2Farakoon_cacc.ini --long-id=H5H5Nh0qINuZuHR11gCJDhHFFx7nxAbN

unable to delete the disk: unable_to_delete

asd is not registered in the ABM:

root@NY1SRV0001:~# alba list-osds --config arakoon://config/ovs/arakoon/ny1-ssdbackend01-abm/config?ini=%2Fmnt%2Fssd1%2Farakoon%2Fexternal_arakoon_cacc.ini --to-json | grep H5H
2017-06-12 09:47:37 237048 -0400 - NY1SRV0001 - 2441/0000 - alba/cli - 0 - info - Albamgr_client.make_client :ny1-ssdbackend01-abm
2017-06-12 09:47:37 238037 -0400 - NY1SRV0001 - 2441/0000 - alba/cli - 1 - info - connect_with : 172.17.16.5 26406 None Net_fd.TCP (fd:3)
2017-06-12 09:47:37 265195 -0400 - NY1SRV0001 - 2441/0000 - alba/cli - 2 - info - closing (fd:3)
root@NY1SRV0001:~# 
khenderick commented 7 years ago

@jeroenmaelbrancke can you re-validate once the new Alba package is available?

openvstorage-ci commented 7 years ago

This issue was moved to openvstorage/alba_ee#52

jeroenmaelbrancke commented 7 years ago

@khenderick what with the green pop up message? This should be red with the alert from alba.

khenderick commented 7 years ago

@jeroenmaelbrancke Do you have some more logging context so I can see where exactly the error was shown?

jeroenmaelbrancke commented 7 years ago

The framework translate the ERROR into a WARNING state... Maybe this is the issue why the pop up is not red?

Jun 12 07:20:53 NY1SRV0019 celery[13127]: 2017-06-12 07:20:53 13000 -0400 - NY1SRV0019 - 14977/139952705304320 - celery/celery.redirected - 182753 - WARNING - 2017-06-12 07:20:53 12900 -0400 - NY1SRV0019 - 14977/139952705304320 - extensio
ns/alba-cli - 182752 - ERROR - Error: Albamgr exception(Albamgr_protocol.Protocol.Error.Unknown,(Failure "Albamgr: too many retries for operation MarkOsdClaimed"))

complete logging:

Jun 12 07:20:01 NY1SRV0019 celery[13127]: 2017-06-12 07:20:01 68800 -0400 - NY1SRV0019 - 13127/139952705304320 - celery/celery.worker.job - 182785 - DEBUG - Task accepted: ovs.storagerouter.ping[1e187cf6-fed0-4ce6-ae04-edaebfe9ba22] pid:1
9253
Jun 12 07:20:01 NY1SRV0019 celery[13127]: 2017-06-12 07:20:01 71500 -0400 - NY1SRV0019 - 13127/139952705304320 - celery/celery.worker.job - 182786 - INFO - Task ovs.storagerouter.ping[1e187cf6-fed0-4ce6-ae04-edaebfe9ba22] succeeded in 0.0
275813881308s: None
Jun 12 07:20:16 NY1SRV0019 celery[13127]: 2017-06-12 07:20:16 64100 -0400 - NY1SRV0019 - 14955/139952705304320 - celery/celery.redirected - 182744 - WARNING - 2017-06-12 07:20:16 64100 -0400 - NY1SRV0019 - 14955/139952705304320 - extensio
ns/asdmanagerclient - 182743 - INFO - Request "add_disk" took 46.34 seconds (internal duration 46.33 seconds)
Jun 12 07:20:17 NY1SRV0019 celery[13127]: 2017-06-12 07:20:17 84300 -0400 - NY1SRV0019 - 14955/139952705304320 - celery/celery.redirected - 182746 - WARNING - 2017-06-12 07:20:17 84200 -0400 - NY1SRV0019 - 14955/139952705304320 - extensio
ns/asdmanagerclient - 182745 - INFO - Request "add_asd" took 1.19 seconds (internal duration 1.19 seconds)
Jun 12 07:20:17 NY1SRV0019 celery[13127]: 2017-06-12 07:20:17 85400 -0400 - NY1SRV0019 - 13127/139952705304320 - celery/celery.worker.job - 182787 - INFO - Task albanode.initialize_disk[a46ee11d-4a78-4976-9cb8-2e7e9f915387] succeeded in 4
8.3459227551s: {}
Jun 12 07:20:52 NY1SRV0019 celery[13127]: 2017-06-12 07:20:52 90800 -0400 - NY1SRV0019 - 13127/139952705304320 - celery/celery.worker.strategy - 182788 - INFO - Received task: alba.add_units[fda13ec8-d115-443c-a550-946985e29793]
Jun 12 07:20:52 NY1SRV0019 celery[13127]: 2017-06-12 07:20:52 90900 -0400 - NY1SRV0019 - 13127/139952705304320 - celery/celery.worker.autoscale - 182789 - INFO - Scaling down 2 processes.
Jun 12 07:20:52 NY1SRV0019 celery[13127]: 2017-06-12 07:20:52 90900 -0400 - NY1SRV0019 - 13127/139952705304320 - celery/celery.pool - 182790 - DEBUG - TaskPool: Apply <function _fast_trace_task at 0x7f49422a3ed8> (args:('alba.add_units', 
'fda13ec8-d115-443c-a550-946985e29793', ('f9599945-f1f5-44d3-8497-0c462ede4ef9', {u'oevWiaqWhOeG1Ji31igxjZTFEoWmNFsu': u'5adf4ed9-d2a5-4a74-a983-ca46eac7c48b', u'H5H5Nh0qINuZuHR11gCJDhHFFx7nxAbN': u'00fc3c11-8c72-48b3-9a3d-1d5c6e855661'})
, {}, {'utc': True, u'is_eager': False, 'chord': None, u'group': None, 'args': ('f9599945-f1f5-44d3-8497-0c462ede4ef9', {u'oevWiaqWhOeG1Ji31igxjZTFEoWmNFsu': u'5adf4ed9-d2a5-4a74-a983-ca46eac7c48b', u'H5H5Nh0qINuZuHR11gCJDhHFFx7nxAbN': u'
00fc3c11-8c72-48b3-9a3d-1d5c6e855661'}), 'retries': 0, u'delivery_info': {u'priority': None, u'redelivered': False, u'routing_key': u'masters.#', u'exchange': u'generic'}, 'expires': None, u'hostname': 'celery@NY1SRV0019', 'task': 'alba.a
dd_units', 'callbacks': None, u'correlation_id': u'fda13ec8-d115-443c-a550-946985e29793', 'errbacks': None, 'timelimit': (None, None), 'taskset': None, 'kwargs': {}, 'eta': None, u'reply_to': u'e3ec9a9e-8933-3825-b5b4-b551830e9732', 'id':
 'fda13ec8-d115-443c-a550-946985e29793', u'headers':... kwargs:{})
Jun 12 07:20:52 NY1SRV0019 celery[13127]: 2017-06-12 07:20:52 94500 -0400 - NY1SRV0019 - 13127/139952705304320 - celery/celery.worker.job - 182791 - DEBUG - Task accepted: alba.add_units[fda13ec8-d115-443c-a550-946985e29793] pid:14977
Jun 12 07:20:53 NY1SRV0019 celery[13127]: 2017-06-12 07:20:53 13000 -0400 - NY1SRV0019 - 14977/139952705304320 - celery/celery.redirected - 182753 - WARNING - 2017-06-12 07:20:53 12900 -0400 - NY1SRV0019 - 14977/139952705304320 - extensio
ns/alba-cli - 182752 - ERROR - Error: Albamgr exception(Albamgr_protocol.Protocol.Error.Unknown,(Failure "Albamgr: too many retries for operation MarkOsdClaimed"))
Jun 12 07:20:53 NY1SRV0019 celery[13127]: Traceback (most recent call last):
Jun 12 07:20:53 NY1SRV0019 celery[13127]:   File "/opt/OpenvStorage/ovs/extensions/plugins/albacli.py", line 139, in run
Jun 12 07:20:53 NY1SRV0019 celery[13127]:     raise AlbaError(output['error']['message'], output['error']['exception_code'])
Jun 12 07:20:53 NY1SRV0019 celery[13127]: AlbaError: Albamgr exception(Albamgr_protocol.Protocol.Error.Unknown,(Failure "Albamgr: too many retries for operation MarkOsdClaimed"))
Jun 12 07:20:53 NY1SRV0019 celery[13127]: 2017-06-12 07:20:53 13100 -0400 - NY1SRV0019 - 14977/139952705304320 - celery/celery.redirected - 182755 - WARNING - 2017-06-12 07:20:53 13100 -0400 - NY1SRV0019 - 14977/139952705304320 - extensio
ns/alba-cli - 182754 - DEBUG - Command: /usr/bin/alba claim-osd --to-json --config=arakoon://config/ovs/arakoon/ny1-ssdbackend01-abm/config?ini=%2Fopt%2FOpenvStorage%2Fconfig%2Farakoon_cacc.ini --long-id=oevWiaqWhOeG1Ji31igxjZTFEoWmNFsu
Jun 12 07:20:53 NY1SRV0019 celery[13127]: 2017-06-12 07:20:53 13100 -0400 - NY1SRV0019 - 14977/139952705304320 - celery/celery.redirected - 182757 - WARNING - 2017-06-12 07:20:53 13100 -0400 - NY1SRV0019 - 14977/139952705304320 - extensio
ns/alba-cli - 182756 - DEBUG - stderr: 2017-06-12 07:20:53 052676 -0400 - NY1SRV0019 - 19580/0000 - alba/cli - 0 - info - Albamgr_client.make_client :ny1-ssdbackend01-abm
Jun 12 07:20:53 NY1SRV0019 celery[13127]: 2017-06-12 07:20:53 053181 -0400 - NY1SRV0019 - 19580/0000 - alba/cli - 1 - info - Albamgr_client.make_client :ny1-ssdbackend01-abm
Jun 12 07:20:53 NY1SRV0019 celery[13127]: 2017-06-12 07:20:53 053266 -0400 - NY1SRV0019 - 19580/0000 - alba/cli - 2 - info - Albamgr_client.make_client :ny1-ssdbackend01-abm
Jun 12 07:20:53 NY1SRV0019 celery[13127]: 2017-06-12 07:20:53 053774 -0400 - NY1SRV0019 - 19580/0000 - alba/cli - 3 - info - connect_with : 172.17.16.6 26406 None Net_fd.TCP (fd:11)
Jun 12 07:20:53 NY1SRV0019 celery[13127]: 2017-06-12 07:20:53 053872 -0400 - NY1SRV0019 - 19580/0000 - alba/cli - 4 - info - connect_with : 172.17.16.6 26406 None Net_fd.TCP (fd:63)
Jun 12 07:20:53 NY1SRV0019 celery[13127]: 2017-06-12 07:20:53 053920 -0400 - NY1SRV0019 - 19580/0000 - alba/cli - 5 - info - connect_with : 172.17.16.6 26406 None Net_fd.TCP (fd:66)
Jun 12 07:20:53 NY1SRV0019 celery[13127]: 2017-06-12 07:20:53 057153 -0400 - NY1SRV0019 - 19580/0000 - alba/cli - 6 - info - connect_with : 172.17.0.64 8601 None Net_fd.TCP (fd:67)
Jun 12 07:20:53 NY1SRV0019 celery[13127]: 2017-06-12 07:20:53 058570 -0400 - NY1SRV0019 - 19580/0000 - alba/cli - 7 - info - closing (fd:67)
Jun 12 07:20:53 NY1SRV0019 celery[13127]: 2017-06-12 07:20:53 125910 -0400 - NY1SRV0019 - 19580/0000 - alba/cli - 8 - info - Dropping abm connection pool after error: (Albamgr_protocol.Protocol.Error.Unknown,(Failure "Albamgr: too many re
tries for operation MarkOsdClaimed"))
Jun 12 07:20:53 NY1SRV0019 celery[13127]: 2017-06-12 07:20:53 125971 -0400 - NY1SRV0019 - 19580/0000 - alba/cli - 9 - info - closing (fd:66)
Jun 12 07:20:53 NY1SRV0019 celery[13127]: 2017-06-12 07:20:53 126007 -0400 - NY1SRV0019 - 19580/0000 - alba/cli - 10 - info - closing (fd:63)
Jun 12 07:20:53 NY1SRV0019 celery[13127]: 2017-06-12 07:20:53 126084 -0400 - NY1SRV0019 - 19580/0000 - alba/cli - 11 - info - closing (fd:11)
Jun 12 07:20:53 NY1SRV0019 celery[13127]: 2017-06-12 07:20:53 13200 -0400 - NY1SRV0019 - 14977/139952705304320 - celery/celery.redirected - 182759 - WARNING - 2017-06-12 07:20:53 13200 -0400 - NY1SRV0019 - 14977/139952705304320 - extensio
ns/alba-cli - 182758 - DEBUG - stdout: {"success":false,"error":{"message":"Albamgr exception(Albamgr_protocol.Protocol.Error.Unknown,(Failure \"Albamgr: too many retries for operation MarkOsdClaimed\"))","exception_type":"albamgr_exn","e
xception_code":1}}
Jun 12 07:20:53 NY1SRV0019 celery[13127]: 2017-06-12 07:20:53 25100 -0400 - NY1SRV0019 - 14977/139952705304320 - celery/celery.redirected - 182761 - WARNING - 2017-06-12 07:20:53 25000 -0400 - NY1SRV0019 - 14977/139952705304320 - extensio
ns/alba-cli - 182760 - ERROR - Error: Albamgr exception(Albamgr_protocol.Protocol.Error.Unknown,(Failure "Albamgr: too many retries for operation MarkOsdClaimed"))
Jun 12 07:20:53 NY1SRV0019 celery[13127]: Traceback (most recent call last):
Jun 12 07:20:53 NY1SRV0019 celery[13127]:   File "/opt/OpenvStorage/ovs/extensions/plugins/albacli.py", line 139, in run
Jun 12 07:20:53 NY1SRV0019 celery[13127]:     raise AlbaError(output['error']['message'], output['error']['exception_code'])
Jun 12 07:20:53 NY1SRV0019 celery[13127]: AlbaError: Albamgr exception(Albamgr_protocol.Protocol.Error.Unknown,(Failure "Albamgr: too many retries for operation MarkOsdClaimed"))
Jun 12 07:20:53 NY1SRV0019 celery[13127]: 2017-06-12 07:20:53 25200 -0400 - NY1SRV0019 - 14977/139952705304320 - celery/celery.redirected - 182763 - WARNING - 2017-06-12 07:20:53 25200 -0400 - NY1SRV0019 - 14977/139952705304320 - extensions/alba-cli - 182762 - DEBUG - Command: /usr/bin/alba claim-osd --to-json --config=arakoon://config/ovs/arakoon/ny1-ssdbackend01-abm/config?ini=%2Fopt%2FOpenvStorage%2Fconfig%2Farakoon_cacc.ini --long-id=H5H5Nh0qINuZuHR11gCJDhHFFx7nxAbN
Jun 12 07:20:53 NY1SRV0019 celery[13127]: 2017-06-12 07:20:53 25200 -0400 - NY1SRV0019 - 14977/139952705304320 - celery/celery.redirected - 182765 - WARNING - 2017-06-12 07:20:53 25200 -0400 - NY1SRV0019 - 14977/139952705304320 - extensions/alba-cli - 182764 - DEBUG - stderr: 2017-06-12 07:20:53 172679 -0400 - NY1SRV0019 - 19583/0000 - alba/cli - 0 - info - Albamgr_client.make_client :ny1-ssdbackend01-abm
khenderick commented 7 years ago

Indeed, that's a valid point.

khenderick commented 7 years ago

Seems we're swallowing failed claim-osd calls. We handle "already claimed" errors, but instead of raising the rest, we ignore them.

khenderick commented 7 years ago

Fixed by

JeffreyDevloo commented 7 years ago

Packaged in openvstorage-backend_1.9.2-1_amd64.deb