maintenance process placement is rather sub awesome for local backend in a multi datacenter ovs cluster

openvstorage / framework-alba-plugin

The Framework ALBA plugin extends the OpenvStorage GUI with functionality to manage ASDs (Alternate Storage Daemon) and Seagate Kinetic drives.

Other

2 stars 3 forks source link

maintenance process placement is rather sub awesome for local backend in a multi datacenter ovs cluster #177

Closed domsj closed 7 years ago

domsj commented 8 years ago

I noticed on the OVH env that maintenance for the hdd-grav backend was running on some of the roubaix nodes (and no maintenance process on any node of the gravelin datacenter). When set up like this you lose local repair, all data has to travel back and forth between the datacenters...

khenderick commented 8 years ago

The maintenance process distribution was designed as "foreach backend, run a maintenance process on all storage nodes until you met the required amount of maintenance processes". The code doesn't know about datacenters, so it doesn't care about the placement.

A solution would be to prefer nodes that have disks claimed to the backend.

domsj commented 8 years ago

When do you deploy maintenance processes? Probably immediately when setting up a backend ... so that's when it doesn't have disks yet. So then you would either have to delay deploying maintenance processes, or move the maintenance processes in some sort of periodic checkup once disks have been claimed?

khenderick commented 8 years ago

Valid point, we currently have a function that will validate the current maintenance processes and remove/add some if required. We can extend this so it can basically move them around, and call it more frequently (now it's only when you set up a backend or add a new node) - e.g. every day.

Then we could have an implementation like this:

Global backend: just pick any nodes
No disks claimed: just pick any nodes
Disks are claimed: pick only nodes that have disks claimed
- What if there is e.g. only one node with disks for a backend?

khenderick commented 8 years ago

After some talking with @domsj, we could take it further and start adding a meaning to the domain tags assigned to a backend. We can add these domains to nodes as well, and prefer to have maintenance processes in the same domain. We can even start to assign roles to nodes.

khenderick commented 8 years ago

@wimpers, this ticket was suddenly qualified, which option do you want to be implemented?

My suggestion using the claimed disks to make an educated guess
@domsj's suggestion to assign meaning to Domains wrt Alba Backends, configure them on Nodes and do assignment like that
@domsj's suggestion to inplement a role system on the backend ("maintenance role")

wimpers commented 8 years ago

After discussion with @khenderick:

Create a checkup system which runs periodically (configurable) and places the maintenance agents accordingly to where the disks are claimed.
For a global backend, create the maintenance agents on nodes to which the subbackend belongs through the claimed disks.

As a side effect https://github.com/openvstorage/framework/issues/569 should be fixed.

khenderick commented 8 years ago

For a global backend using only remote backends this is an issue, as they don't have any nodes on which to run the maintenance mode.

khenderick commented 8 years ago

Fixed by #202, packaged in openvstorage-backend-1.7.3-rev.694.f9f958c

kinvaris commented 7 years ago

Will verify this next week during reinstall/upgrade of OVH environment.

kinvaris commented 7 years ago

On OVH we see indeed that the maintenance agents are placed in the respective datacenters. Although these are only 2 sites where we observed this.