rposudnevskiy / RBDSR

RBDSR - XenServer/XCP-ng Storage Manager plugin for CEPH
GNU Lesser General Public License v2.1
58 stars 23 forks source link

Upgrade #63

Open Emmenemoi opened 6 years ago

Emmenemoi commented 6 years ago

Hi, what would you suggest to upgrade (ex: 7->7.2)? The problem being different RBDSR versions (within the v1 branch). Upgrade first RBDSR everywhere?

rposudnevskiy commented 6 years ago

Hi, I'm not sure I understood the question correctly. RBDSR is not tied hard to certain XenServer version and you can perform an upgrade to the 7.2. Please note that /etc/xapi.conf may be rewritten during the upgrade so XenServer will not load RBDSR plugin after the upgrade. Or an upgrade process may damage the installed RBDSR plugin any other way because RBDSR is not an official plugin. In light of the above, you may need to reinstall RBDSR after XenServer upgrade. Also, I did not quite understand which versions did you mean in v1 branch. At the moment there are two versions v1 and v2 and they are not compatible with each other and there isn't an easy way to migrate from v1 to v2. Besides, the v2 is not finished yet as it doesn't have garbage collector implemented, so if you use snapshots actively you end up with very long images chains and inoperable system. RBDSR v2 uses a technique like VHD-chaining in all three modes: VDH, DMP, RBD and the presence of a garbage collector is vital.

Emmenemoi commented 6 years ago

Sorry, it was late night. I mean: I installed an old RBDSR version (from the time I was contributing) which needs to be updated now. I updated the master only to 7.2, as a consequence I installed on it a fresh RBDSR v1 version (in practical: first installed v2, then replaced with v1 to try, explaining my recent modifications of the install scripts. problems mays also come from this). There were several errors: missing "_unmap", missing "_dev_name" etc which prevented the SR to be usable. I tried to replace with my old version but the VDIs got detached from VHD and everything crashed (production cluster :( ). I succeeded to revert to the backup and everything went back on (7.0).

Now I'm trying to find out how to upgrade to 7.2 without crash ... Maybe the best option being to shutdown all VMs, upgrade all nodes, install RBDSR from scratch, plug PBDs, restart all VMs...

(well noted about v2, and thanks for the good job)

Emmenemoi commented 6 years ago

Other possibility, use live migration between pools (cannot be done on 7.3 free): Rolling live migration of VM to a new fresh 7.2 cluster with latest RBDSR v1 (starting from moving VM from 7.0 slaves -> 7.2, until 7.0 master -> 7.2 then shutdown 7.0 master and install fresh 7.2 - being slave now - replacing the old 7.0 master).

This could be a recommended method to safely update RBDSR.

rposudnevskiy commented 6 years ago

Hi, I think the last option with the migration of VM to new 7.2 cluster definitely should work as well as the option with the shutdown of all VMs, upgrade all nodes and installation of RBDSR from scratch. These are the best options at the moment because, as I mentioned above, any upgrade definitely may damage the installed RBDSR as it's not officially supported by XenServer.

But I think you could try to upgrade your XenServer cluster to 7.2. first and don't touch RBDSR. To avoid any damages to VMs you should do it with all VMs shutted down. Then check the installed RBDSR (check /etc/xapi.conf, also check /sbin/tap-ctl, /bin/vhd-tool and /usr/libexec/xapi/sparse_dd as they will likely be rewritten during upgrade and you should move them to *-orig and replace them by wrappers from RBDSR if you need a working SXM). If you can start the VMs after all then as next step you can plan the upgrade of RBDSR.

Also possible the other option: shutdown all VMs, uninstall RBDSR, upgrade to 7.2, install new RBDSR, start VMs.

Anyway, at the moment there isn't a possibility to upgrade cluster with the VMs running except for the option with another cluster as in your last proposal.

Thank you very much for your participation in the project. I think we can prepare a document with recommendations for upgrading based on your experience. It will be a very useful document.

PS: Just wondering. :-) Do you use RBDSR in production? How is it? Is it a big installation? I thought RBDSR is too raw for use in production.

Thank you

Emmenemoi commented 6 years ago

I'll upgrade the cluster and write something. But the most secure thing IMHO is the scratch 7.2 cluster and rolling node installs + live migrations (.

It's been more than 2 years I have a small production cluster (4 nodes) which works pretty well out of snapshots :). But I'm using another script for backuping everything (vm pause + rbd snap + snap copy to backup cluster: https://github.com/Emmenemoi/cephbackup) -> creating small conflicts in snapshot detection from RBDSR but without consequences: just ignored. It's setup using kernel driver. Nbd was disconnecting / hanging randomly.

Emmenemoi commented 6 years ago

@rposudnevskiy Would it be possible to have the same Ceph pool (then same SR uuid) in 2 different XS pools? It seems to cause troubles with SR locks. But not sure. (I'm trying to export => import VM metadata only and then reuse the data from the same SR in the new clean upgraded pool. This would result in nearly 0 downtime for VM migration after shutdown. But impossible to make it work)