Closed jeroenmaelbrancke closed 7 years ago
Check if the healthcheck volume is still present in the model and if needed deleted the volume.
What is the root cause for the delete not working as intended? Fixing the root cause would be the best solution. It would make more sense for the health check to check after it deleted the disk that everything was deleted correctly.
ovs-volumedriver:
Jan 02 15:36:09 stor-01.cl-g8-uk1 volumedriver_fs.sh[12576]: 2017-01-02 15:36:09 068905 +0100 - stor-01.cl-g8-uk1 - 12576/0x00007fe9167f0700 - volumedriverfs/MetaDataServerTable - 00000000001379c0 - info - ~Table: b5009c7d-4517-41b8-950e-
bba3d5113205: bye
Jan 02 15:36:09 stor-01.cl-g8-uk1 volumedriver_fs.sh[12576]: 2017-01-02 15:36:09 069558 +0100 - stor-01.cl-g8-uk1 - 12576/0x00007fe9167f0700 - volumedriverfs/RocksLogger - 00000000001379c1 - info - /mnt/ssd2/vmstor_db_mds_1: EVENT_LOG_v1
{"time_micros": 1483367769069541, "job": 0, "event": "table_file_deletion", "file_number": 9204}
Jan 02 15:36:09 stor-01.cl-g8-uk1 volumedriver_fs.sh[12576]: 2017-01-02 15:36:09 069649 +0100 - stor-01.cl-g8-uk1 - 12576/0x00007fe9167f0700 - volumedriverfs/DataStoreNG - 00000000001379c2 - info - destroy: b5009c7d-4517-41b8-950e-bba3d51
13205: destroying DataStore, DeleteLocalData::T
Jan 02 15:36:09 stor-01.cl-g8-uk1 volumedriver_fs.sh[12576]: 2017-01-02 15:36:09 069800 +0100 - stor-01.cl-g8-uk1 - 12576/0x00007fe9167f0700 - volumedriverfs/SCOCacheMountPoint - 00000000001379c3 - info - removeNamespace: "/mnt/ssd1/vmsto
r_write_sco_1": removing namespace b5009c7d-4517-41b8-950e-bba3d5113205 from mountpoint
Jan 02 15:36:09 stor-01.cl-g8-uk1 volumedriver_fs.sh[12576]: 2017-01-02 15:36:09 069981 +0100 - stor-01.cl-g8-uk1 - 12576/0x00007fe9167f0700 - volumedriverfs/Volume - 00000000001379c4 - info - destroy: b5009c7d-4517-41b8-950e-bba3d5113205
: Unregistering volume from ClusterCache
Jan 02 15:36:09 stor-01.cl-g8-uk1 volumedriver_fs.sh[12576]: 2017-01-02 15:36:09 070038 +0100 - stor-01.cl-g8-uk1 - 12576/0x00007fe9167f0700 - volumedriverfs/BackendConnectionInterfaceLogger - 00000000001379c5 - info - Logger: Entering de
leteNamespace b5009c7d-4517-41b8-950e-bba3d5113205
Jan 02 15:36:09 stor-01.cl-g8-uk1 volumedriver_fs.sh[12576]: 2017-01-02 15:36:09 077976 +0100 - stor-01.cl-g8-uk1 - 12576/0x00007fe9167f0700 - volumedriverfs/BackendConnectionInterfaceLogger - 00000000001379c6 - info - ~Logger: Exiting de
leteNamespace for b5009c7d-4517-41b8-950e-bba3d5113205
Jan 02 15:36:09 stor-01.cl-g8-uk1 volumedriver_fs.sh[12576]: 2017-01-02 15:36:09 078017 +0100 - stor-01.cl-g8-uk1 - 12576/0x00007fe9167f0700 - volumedriverfs/VolManager - 00000000001379c7 - notice - Destroy Volume, VolumeId: b5009c7d-4517
-41b8-950e-bba3d5113205, delete local data: DeleteLocalData::T, remove volume completely RemoveVolumeCompletely::T, delete namespace DeleteVolumeNamespace::T, force deletion ForceVolumeDeletion::F, FINISHED
ovs-workers:
Jan 02 15:36:09 stor-01.cl-g8-uk1 celery[32628]: 2017-01-02 15:36:09 08100 +0100 - stor-01.cl-g8-uk1 - 28603/140015761745664 - celery/celery.redirected - 187059 - WARNING - 2017-01-02 15:36:09 08100 +0100 - stor-01.cl-g8-uk1 - 28603/140015761745664 - log/volumedriver_task - 187058 - INFO - [ovs.lib.vdisk.delete_from_voldrv] - ["b5009c7d-4517-41b8-950e-bba3d5113205"] - {} - {}
In the lib.log file i see a lot of connection refused with healthcheck volumes:
2017-01-02 15:36:04 50900 +0100 - stor-01.cl-g8-uk1 - 29717/140184351565568 - lib/mds - 178 - DEBUG - MDS safety: vDisk 64720c77-4fc1-4f31-b0b6-09dec7b5663b: Start checkup for virtual disk ovs-healthcheck-test-mSq1IJZLFbBzbDCQ.raw
2017-01-02 15:36:04 55200 +0100 - stor-01.cl-g8-uk1 - 29717/140184351565568 - lib/mds - 179 - DEBUG - MDS safety: vDisk 64720c77-4fc1-4f31-b0b6-09dec7b5663b: Reconfiguration required. Reasons:
2017-01-02 15:36:04 55300 +0100 - stor-01.cl-g8-uk1 - 29717/140184351565568 - lib/mds - 180 - DEBUG - MDS safety: vDisk 64720c77-4fc1-4f31-b0b6-09dec7b5663b: * Not enough safety
2017-01-02 15:36:04 55300 +0100 - stor-01.cl-g8-uk1 - 29717/140184351565568 - lib/mds - 181 - DEBUG - MDS safety: vDisk 64720c77-4fc1-4f31-b0b6-09dec7b5663b: * Not enough services in use in primary domain
2017-01-02 15:36:05 16600 +0100 - stor-01.cl-g8-uk1 - 29717/140184351565568 - lib/vdisk - 182 - ERROR - Got failure during (re)configuration of vDisk ovs-healthcheck-test-mSq1IJZLFbBzbDCQ.raw
Traceback (most recent call last):
File "/opt/OpenvStorage/ovs/lib/vdisk.py", line 645, in create_new
MDSServiceController.ensure_safety(new_vdisk)
File "/opt/OpenvStorage/ovs/lib/mdsservice.py", line 616, in ensure_safety
client.create_namespace(str(vdisk.volume_id))
RuntimeError: Connection refused
Is this still an issue I believe we removed the check of the voldrv by making vols?
We are currently working around the problem by disabling the volumedriver test. However the core issue should still be investigated and thats why this ticket should remain open.
https://github.com/openvstorage/framework/issues/1390 - providing a unique id to every disk has proven to no longer produce these issues. However the core issue is still being worked won (see linked ticket)
Issue no longer present. Root cause fix in linked ticket.
Problem description
When the .raw file from the healthcheck is not been cleaned in the framework we received a
got fault response unlink
exception.Possible root of the problem
The .raw is still present on the framework even if the volume already is deleted from the volumedriver.
Possible solution
Check if the healthcheck volume is still present in the model and if needed deleted the volume. I know the healthcheck is not a self-healing product but this would be a good solution (only for the volumes created by the healthcheck).
Temporary solution
Delete the vdisk from the model
Additional information
Setup
Packages