openvstorage / alba

Open vStorage ALBA (alternate backend) creates a replicated or flexible network raid’ed object storage backend out of Seagate Kinetic drives and local disk supporting compression, encryption.
Other
28 stars 10 forks source link

maintenance tasks increases when asd is down (cleanuposdnamespace) #761

Open jeroenmaelbrancke opened 7 years ago

jeroenmaelbrancke commented 7 years ago

I know it is normal that the tasks on the maintenance increases but after 15min the chunks on this asd will be created on other asds. So if an asd is down for x hours the maintenance agent should ignore these tasks.

In my example osd 14 and 15 are down for 3 days and the amount of work still increase on the maintenance agent while the auto repair timeout is 900 seconds.

Maintenance config = {
  "enable_auto_repair": true,
  "auto_repair_timeout_seconds": 900.0,
  "auto_repair_disabled_nodes": [],
  "enable_rebalance": true,
  "cache_eviction_prefix_preset_pairs": {},
  "redis_lru_cache_eviction": {
    "host": "172.17.16.22",
    "port": 6379,
    "key": "alba_lru_56f58646-419d-4236-a868-e3b79ac8784d"
  }
}

work items:

54158935 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (15L, 1004123L))"
54159096 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (15L, 1004127L))"
54159149 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (15L, 1004126L))"
54159150 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (14L, 1004126L))"
54159271 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (15L, 1004134L))"
54159324 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (15L, 1004132L))"
54159377 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (15L, 1004129L))"
54159430 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (15L, 1004135L))"
54159483 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (15L, 1004128L))"
54159536 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (15L, 1004131L))"
54159537 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (14L, 1004131L))"
54159589 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (15L, 1004138L))"
54159590 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (14L, 1004138L))"
54159642 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (15L, 1004136L))"
54159704 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (15L, 1004143L))"
54159757 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (15L, 1004145L))"
54159863 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (15L, 1004137L))"
54159916 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (15L, 1004142L))"
54159917 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (14L, 1004142L))"
54159969 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (15L, 1004144L))"

amount of work items: image

toolslive commented 7 years ago
54159757 | "(Albamgr_protocol.Protocol.Work.CleanupOsdNamespace (15L, 1004145L))"

Delete all that's left on osd 15L from namespace 1004145L . The namespace was deleted, but the osd was down, and the work item is kept in the work queue in te abm (and retried and tracked without success in the maintenance processes).

wimpers commented 6 years ago

@toolslive

toolslive commented 6 years ago

If the OSD was purged, the CleanupOsdNamespace items will complete without problem. The maintenance agent that does it, will log

   "UnknownOsd(%Li) => no cleanup to be done anymore

on info level.