olopez32 / ganeti

Automatically exported from code.google.com/p/ganeti
0 stars 0 forks source link

DRBD Volume above 4TB not working #256

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What software version are you running? Please provide the output of "gnt-
cluster --version" and "gnt-cluster version".

gnt-cluster (ganeti v2.5.2) 2.5.2
Software version: 2.5.2
Internode protocol: 2050000
Configuration format: 2050000
OS api version: 20
Export interface: 0

What distribution are you using?

Ubuntu 12.04, DRBD 8.3.11 (api:88)

What steps will reproduce the problem?
1. Create a VM with disk type DRBD above 4TB

What is the expected output? What do you see instead?

Instance creation fails with "/var/log/ganeti/node-daemon.log:2012-08-08 
16:27:44,552: ganeti-noded pid=4148 ERROR Can't assemble device after creation, 
unusual event: drbd2: can't attach local disk: /dev/drbd2: Failure: (111) 
Low.dev. smaller than requested DRBD-dev. size."

Please provide any additional information below.

Even if you specify more than 128MB as DRBD meta volume size it fails. This is 
due DRBD external meta disk handling. When it's created as an index it is 
always fixed to 128MB and results in a maximum volume size somewhere below 4TB.

Creating a flexible meta disk without an index or using internal meta-data 
would solve this issue. There is no need for indexed meta-disks as Ganeti is 
creating one meta-disk lvm volume for each VM anyway.

Here is a Florian Haas' statement from Linbit (the DRBD creators) considering 
internal and external meta-disks.

http://lists.linbit.com/pipermail/drbd-user/2008-June/009628.html

best regards,

Adrian

Original issue reported on code.google.com by adrian.s...@gmail.com on 9 Aug 2012 at 9:18

GoogleCodeExporter commented 9 years ago
Same issue on Ganeti 2.6.x on Gentoo Linux.

This issue could have a solution if we have a way to tell ganeti to use drbd 
with internal meetadata. That shoud remove this 4TB limitation at disk creation 
time.

Original comment by zen2dr...@gmail.com on 2 Feb 2013 at 2:18

GoogleCodeExporter commented 9 years ago
The problem with internal metadata is that it makes disk resizes and conversion 
impossible, since now the drbd volume is not a "raw" image. This is the reason 
why we did not implement internal metadata so far.

It could be possible to use internal metadata, but then this would have to be 
tracked, and all the above operation disallowed. Hmm…

Original comment by ius...@google.com on 4 Feb 2013 at 8:42

GoogleCodeExporter commented 9 years ago
Which would be possible if we used a disk parameter and documented that 
enabling it loses support for resize and conversion operations. (of course 
code-wise this shouldn't be hardcoded then, but asked by cmdlib to each drbd 
device, to avoid a layering violation).

I think if someone would contribute this patch we could accept it, what do you 
think?
Thanks,

Guido

Original comment by ultrot...@google.com on 15 Feb 2013 at 2:48

GoogleCodeExporter commented 9 years ago
Yes, sure. But I don't think it'll be a trivial patch…

Original comment by ius...@google.com on 15 Feb 2013 at 2:54

GoogleCodeExporter commented 9 years ago
As discussed yesterday on IRC an alternative would be to use flexible external 
metadata, which allows a metadata size proportional to the data size, instead 
of exactly 128M, and thus scalable over 4T. What do you think?

Original comment by ultrot...@gmail.com on 23 Mar 2013 at 7:29

GoogleCodeExporter commented 9 years ago

Original comment by ultrot...@google.com on 3 May 2013 at 9:27

GoogleCodeExporter commented 9 years ago
Ganeti uses for DRBD 8.4 (supported as of 2.9) flexible external metadata per 
default.
It does not, however, compute the metadata size based on the actual disk size. 
So there is still work to do in order to support disks > 4TB.

I'm unassigning myself from this bug, as I'm not actively working on it.

Original comment by thoma...@google.com on 28 Jun 2013 at 11:47

GoogleCodeExporter commented 9 years ago
Move non-critical bugs scheduled for 2.8 or 2.9 to 2.11, as in those versions 
only critical bug fixes will be integrated.

Original comment by thoma...@google.com on 30 Oct 2013 at 9:48

GoogleCodeExporter commented 9 years ago
is there any current workaround for this? i ran into this issue today. :)

Original comment by hei...@googlemail.com on 6 Feb 2014 at 7:36

GoogleCodeExporter commented 9 years ago
A quick and dirty workaround is to change the value of the DRBD_META_SIZE 
constant in constants.py (or _constants.py on newer Ganeti versions) to 
something above 128MB (double it for 8TB, for example). Make sure to change it 
to the same value on all nodes in your cluster.

While this would allow you to create disks bigger than 4TB, I'm not quite sure 
which problems you could run into with existing disks. Make sure to test such a 
workaround before using it in production.

Original comment by thoma...@google.com on 6 Feb 2014 at 9:12

GoogleCodeExporter commented 9 years ago
Another dirty work-around we use is to create several virtual disks of 4TB each 
for the same instance and merge them in the instance with LVM.

It's been working great so far with 12TB+ devices without any modification in 
the Ganeti code.

We are still waiting for this issue to be handled as >4TB devices are getting 
more and more common those days.

Original comment by cyril.bo...@isvtec.com on 6 Feb 2014 at 9:18

GoogleCodeExporter commented 9 years ago
I'm changing the DRBD_META_SIZE value of 
/usr/share/ganeti/2.11/ganeti/constants.py each time I need a volume > 4TB and 
reverse the change after.

It's really annoying. I think the simple way is to calculate DRBD_META_SIZE 
based on the disk size to be created with a minimum of 128 MB like actually 
fixed.

if DRBD_DISK_SIZE > 4TB then DRBD_META_SIZE = DRBD_DISK_SIZE/32

This way we could have any size of volume.

I wonder even if we can use DRBD_DISK_SIZE/32 for any size of drbd disk to get 
a low space overhead in case of little volume usage.

Original comment by zen2dr...@gmail.com on 3 Dec 2014 at 5:48

GoogleCodeExporter commented 9 years ago
I've searched what is the good size to have for metadata disk and it's really 
well explained on linbit drbd documentation:
http://drbd.linbit.com/users-guide-8.4/ch-internals.html#s-meta-data-size

So their estimation calcul is really corrrect even if we could use the exact 
calcul.
The maximum overrate by estimation is < 1Mb.

So their estimation is that a volume of X megabytes need a medata disk size of 
Y megabytes like Y=X/32768+1

Some results as example (with 1Mb minimum due by estimation)
1 Gb => 1 Mb
10 Gb => 1Mb
100 Go => 4Mb
1To => 33Mb
4To => 129Mb
10 To => 321 Mb
16 To => 513 Mb

So actually ganeti use a 128Mb metadata disk size that limit disk size to ~4 Tb 
and waste a lot of medata for little disk size that are really common in real 
usage.

So a great patch would be to use this formula to calculate the metadata disk 
size instead the 128Mb fixed size that waste disk storage and limit disk size 
to ~4To.

Original comment by zen2dr...@gmail.com on 13 Feb 2015 at 2:54

GoogleCodeExporter commented 9 years ago
I agree that we need to do the change in a way that doesn't require altering 
all existing disks, or it will be a huge burden to upgrade. The other issue is: 
how does this formula work when increasing disk sizes? We'll need to make sure 
to enlarge metadata disks as well, then, so that it works. (or not to even try 
if the requested new disk size would exceed the current metadata allowed size).

As long as these two issues are contemplated, we'd be happy to accept patches 
in the directions proposed in this thread.

Thanks,

Guido

Original comment by ultrot...@google.com on 15 Apr 2015 at 1:03

GoogleCodeExporter commented 9 years ago
Linbit DRBD documentation speaks about growth operation:
https://drbd.linbit.com/users-guide/s-resizing.html

So it's more complex in case of internal metadata (need to move the metadata to 
the new volume end).

In case of external metadata, extend it offline seems trivial:
"When the backing block devices on both nodes are grown while DRBD is inactive, 
and the DRBD resource is using external meta data, then the new size is 
recognized automatically. No administrative intervention is necessary. The DRBD 
device will have the new size after the next activation of DRBD on both nodes 
and a successful establishment of a network connection."

Extend it online, a simple drbd resize seems to do it so not so complex.

Original comment by zen2dr...@gmail.com on 22 May 2015 at 5:06