raspberrypi / linux

Kernel source tree for Raspberry Pi-provided kernel builds. Issues unrelated to the linux kernel should be posted on the community forum at https://forums.raspberrypi.com/
Other
11.02k stars 4.95k forks source link

Feature request: LVM on standard install #2400

Closed rkarlsba closed 6 years ago

rkarlsba commented 6 years ago

Hi all

It would be nice to have the standard install on some future version (tomorrow? ;) ) on LVM by default instead of partitions. It would allow the user/admin more flexibility for the system, for obvious reasons. The overhead added is negligible anyway. Redhat/Centos has been using LVM by default for a decade or so, and it works well with a pi. Please allow to make this change.

roy

pelwell commented 6 years ago

Assume we know nothing about LVM and care intensely about any performance change - slowdowns, extra memory usage etc. - and then persuade us this change is beneficial.

rkarlsba commented 6 years ago

I assume you, as in the developers and users of raspbian, actually know what LVM is. It's been around for two decades or so and is widely used on all sort of linux machines.

pelwell commented 6 years ago

These requests are read by many third parties. If you can't be bothered to write something...

Ferroin commented 6 years ago

Wikipedia does a reasonably good job of explaining the basics, so I won't waste space here quoting them: https://en.wikipedia.org/wiki/Logical_Volume_Manager_(Linux)

The short version is that LVM is a block layer that sits between the filesystems and physical hardware and allows arbitrary block mappings and various complex transformations to be applied to the data as it's transferred to the device, with the most notable features being:

The biggest potential benefit I can see to having it enabled by default on the Pi is that it would allow much easier conversion from a regular Raspbian image to an alternative boot setup using a USB device for the root filesystem (having LVM involved would mean this could be done online with zero down time using the pvmove command to migrate the root volume to the flash drive).

As far as overhead, the dm-linear target (the backend used by LVM for regular logical volumes that just sit on a single device without any special stuff like thin provisioning) has very low overhead in most cases, though i can't comment on it's exact performance implications on a Pi.

Personally however, I'm actually against adding it to the default images because:

Instead, I think a better option is to make it easier to create Raspbian images with arbitrary storage configurations, which would in theory allow people who actually need LVM (or BTRFS, or F2FS, or bcache, or some other configuration) to more easily create such images themselves.

pelwell commented 6 years ago

Thank you @Ferroin for a very complete, clear and persuasive post.

rkarlsba commented 6 years ago

Well, it doesn't matter much, since it can be hidden easily just like today's root expansjon. But again, it can help users with a litt knowhow to do their things more easily.

Again, if the user doesn't know about filesystems, {s,}he won't mind if LVM's there

Can you point to one single incident where LVM failed on simple storage (that is, not RAID)? I've been using it for over a decade and haven't seen that yet.

But then, to create more images will probably be quite good. If someone is mad enough to use BTRFS, go on.

Oh - and btw, bcache, like flashcache and dm-cache, isn't a filesystem, it's a caching layer, and mostly superseded by lvmcache, which obviously involves lvm ;)

roy

Ferroin commented 6 years ago

Well, it doesn't matter much, since it can be hidden easily just like today's root expansjon. But again, it can help users with a litt knowhow to do their things more easily.

No, you really can't 'hide' LVM. It's either there or it isn't. The root expansion thing is a one-shot that runs the first time the image boots and then deletes itself, so it's not any kind of persistent thing like LVM is. And, in fact, LVM would make that more complex (right now, it just resizes the partition and the filesystem, with LVM it would have to resize the partition, the physical volume, the logical volume, and the filesystem).

Again, if the user doesn't know about filesystems, {s,}he won't mind if LVM's there

Until it breaks, at which point it's harder to fix.

Can you point to one single incident where LVM failed on simple storage (that is, not RAID)? I've been using it for over a decade and haven't seen that yet.

If you lose your volume group metadata or it becomes corrupted (which is absolutely possible with a single device), LVM breaks. It does at least support having multiple copies (though it defaults to one per PV), so it's a bit better in that respect than the old DOS partition tables we're stuck using for the SD card. Overall, this isn't very likely on systems using SSD"s or hard drives because they're generally very reliable (though I have seen bad sectors in the first metadata block in a PV cause LVM to freak out), but there have been issues in the past with certain combinations of SD cards and power supplies with the Pi causing random data corruption on the SD card.

But then, to create more images will probably be quite good. If someone is mad enough to use BTRFS, go on.

I wouldn't say 'mad' so much as 'doesn't care about performance'. BTRFS runs perfectly fine on a Pi, it's just slow (although it's not too bad if you're not writing much data to it, and most of the problem is how slow storage on the Pi is to begin with).

Oh - and btw, bcache, like flashcache and dm-cache, isn't a filesystem, it's a caching layer, and mostly superseded by lvmcache, which obviously involves lvm ;)

I didn't mean using bcache as a filesystem, I just meant having it in the storage layer. FWIW though, bcache almost is a filesystem internally, and there has been talk in the past about possibly adding a VFS interface to it.

rkarlsba commented 6 years ago

Anyway - I don't know why some people hate LVM - it just works and has done so for years, and it really helps a lot for those of us who want to separate root and data without fiddling around with partition tables designed three decades ago. LVM adds flexibiliy, not complexity. Try to make a newbie resize a partition table without destroying his or her data.

Ferroin commented 6 years ago

I'm not saying I hate it. In fact, I use it on almost all of my systems, largely because of the flexibility it offers (I use BTRFS on top of it, so together I can literally reprovision the entire system online with zero downtime). The small number of systems I don't use it on are all cases where it just doesn't make sense to use LVM because I'm more likely to need to rebuild the system from scratch than change partition sizes (altogether, it's a half dozen minimalistic VM's, two VPS nodes, and a handful of Pi's that are all being used for IoT applications)

What I am saying is that I don't think it's the best idea to just have it by default as part of the system image.

As far as it supposedly not adding complexity, that's dependent on where you look. It does greatly simplify the process of reprovisioning a system, but only in very specific cases (namely, you aren't changing the size of any of the physical volumes at all, and you don't need to handle certain perfectly reasonable operations like reducing the size of a thin storage pool). It does however add complexity in a number of ways:

  1. Every single block I/O request has another layer of indirection it has to pass through. In the case of what are likely to be the most common configurations on the Pi, this doesn't add much complexity, but it is still complexity.
  2. A user who wants to change their storage layout has to go through more documentation to figure out how to do it, and there are actually more ways it can go wrong (because there are more components involved that could fail).
  3. It significantly complicates the process of resizing the root volume automatically on the first boot. Right now, the process is (mostly) just a call to sfdisk to adjust the partition table, and then a call to resize2fs to resize the filesystem. With LVM, that would need to also include a pvresize command and a lvresize command. On top of that, the script has to handle the possibility of LVM not being involved (adding further complexity).
  4. More modules would have to be built-in to the kernel, which increases the complexity of building a custom kernel and add yet another way things could break.
rkarlsba commented 6 years ago
  1. The complexity is minimal, and the code is well tested over very long time. The small number of instructions needed to allow this, is perhaps one in a million the time of the time spent waiting for I/O.
  2. That documentation can be written on half a page on low resolution html.
  3. Adding pvresize and lvresize isn't significant, it's a minor - it's two lines of bash code.
  4. Those modules are in there already.
asavah commented 6 years ago
  1. Yes it's well tested, but do you have benchmark data about performance impact on the pi? Edit: think about slow pi1 and zero and slow cheap sdcards.
  2. LVM can be scary for new users, the documentation about all the bells and whistles could be written in a book and not half a page.
  3. Agree.
  4. Nope. In order to boot from LVM you either need the module(s) in initramfs or in built-in in kernel, which would make kernel image bigger.

While I think LVM is an awesome tool which I have been using for years on servers I don't think it's good for the pi. Just my 5c.

JamesH65 commented 6 years ago

I think what we would need to consider, notwithstanding any technical pro's and con's as already discussed, prior to adding this is :

  1. Specific figures for performance impact
  2. Some idea of any increased maintenance burden. We have limited staff, so any increase in maintenance needs to be carefully considered
rkarlsba commented 6 years ago
  1. Something like https://www.redhat.com/archives/linux-lvm/2006-July/msg00092.html answers this rather easily. It's negligible.
  2. It's a few lines in the resize script. IMHO it's a minor. If we had spent the time one switching to LVM instead of arguing, it'll be finished already.
pelwell commented 6 years ago

I'm surprised the hordes of people clamouring for LVM support haven't pooled their resources and created a build with LVM enabled.

rkarlsba commented 6 years ago

Well, I'm not a developer, I'm just suggesting things to try to help things getting better.

Ferroin commented 6 years ago

Regarding the resize script, I'm not worried as much about the overhead (though it would make the resize take longer), I'm worried about there being two more places things can go wrong that need to have errors sanely handled. And yes, I know it's 'reliable', but that doesn't mean you shouldn't have proper error handling.

As far as just general performance overhead, it's small enough for simple linear mappings on at least x86 that you need to use the low-level kernel tracing to measure it. IIRC, it ends up being a couple function calls, a table lookup (to figure out the exact mapping for the required blocks) and some basic math. The memory overhead is likely to be the bigger issue (even linear mappings add measureable memory overhead), though that's harder to quantify exactly (because it's a lot more dependent on the exact layout of things on-disk).

rkarlsba commented 6 years ago

pvresize /dev/blah lvresize -L +100% /dev/blah

those two will be finished in times mesured in milliseconds, maximum two digits

As for overhead, it didn't have much impact back when we had pentium 3 processors clocked at 500MHz. Yes, you can measure it, but then, you can measure a lot with a microscope.

I wonder why you are so afraid of using modern things like LVM. It's not a new filesystem, like btrfs, which is rather unstable in certain circumstances - it's a volume manager, and it's dead stable.

pelwell commented 6 years ago

Have you considered a different distribution, one not aimed at education (hence the emphasis on simplicity, or at least minimising avoidable complexity)? Ubuntu Mate, openSUSE, etc. One of them must have LVM support.

XECDesign commented 6 years ago

To avoid anybody wasting time measuring performance impact and all that, for the sake of argument, let's grant that it's likely to be negligible and that there are some advanced use cases where LVM is more convenient. At this point, I still have no intention of switching the Raspbian builds to LVM.

There is absolutely no benefit to the target user base and great inconvenience to everybody who has gotten used to MBR. This would affect a ton of tutorials, books and forum posts. The reasons to switch have to be clear and to the benefit of the majority of the target user base. Even small changes result in a lot of questions on the forum and accusations that things are changing for the sake of changing.

Instead, I would prupose you write a script that takes an existing image and creates a copy based on LVM. If it turns out that that script becomes widely used, then we can revisit it.

Ferroin commented 6 years ago

@rkarlsba Have you been intentionally ignoring actual content in my comments on purpose?

pvresize /dev/blah lvresize -L +100% /dev/blah

those two will be finished in times mesured in milliseconds, maximum two digits

I already said I wasn't worried about the overhead of the command itself running:

Regarding the resize script, I'm not worried as much about the overhead (though it would make the resize take longer), I'm worried about there being two more places things can go wrong that need to have errors sanely handled. And yes, I know it's 'reliable', but that doesn't mean you shouldn't have proper error handling.

And then this:

As for overhead, it didn't have much impact back when we had pentium 3 processors clocked at 500MHz. Yes, you can measure it, but then, you can measure a lot with a microscope.

Just reiterated what I said here about performance impact:

As far as just general performance overhead, it's small enough for simple linear mappings on at least x86 that you need to use the low-level kernel tracing to measure it. IIRC, it ends up being a couple function calls, a table lookup (to figure out the exact mapping for the required blocks) and some basic math.

And completely ignored my (perfectly valid) comment later in the same paragraph about runtime memory usage (and yes, I know it's at most double digit kB, but that is still significant when you have only one GB of RAM).

I wonder why you are so afraid of using modern things like LVM. It's not a new filesystem, like btrfs, which is rather unstable in certain circumstances - it's a volume manager, and it's dead stable.

Which directly contradicts a comment I made much earlier:

I'm not saying I hate it. In fact, I use it on almost all of my systems, largely because of the flexibility it offers (I use BTRFS on top of it, so together I can literally reprovision the entire system online with zero downtime). The small number of systems I don't use it on are all cases where it just doesn't make sense to use LVM because I'm more likely to need to rebuild the system from scratch than change partition sizes (altogether, it's a half dozen minimalistic VM's, two VPS nodes, and a handful of Pi's that are all being used for IoT applications).

Now, I will admit I may not have made my argument quite as plain as I could have, but most of it is pretty much the same as the argument @XECDesign just made for not switching things. It doesn't benefit the intended user base for Raspbian while having a significant and likely negative impact on them.

rkarlsba commented 6 years ago

"It doesn't benefit the intended user base for Raspbian while having a significant and likely negative impact on them." It benefits the ones that know Linux, and the ones to come to learn more, and it does not have a "significant and likely negative impact" on the other users. It's rock stable, as you may know if you're using it.

JamesH65 commented 6 years ago

Just read through all of this issue again. I think its is agreed that is it

  1. Stable
  2. Probably has minimal impact on performance, although I note that no-one has actually provided any figures for CPU and memory impact.
  3. There a bit of work required to implement it.

What hasn't been shown is the amount of impact and consequent work required to update all the documentation, and as the person responsible for that, with limited time, this requests borders on change for changes sake. The work required to implement pales in to insignificance to that required elsewhere.

So, for the moment, it seems unlikely we will be going down this route. So closing.

sdo101 commented 6 years ago

If Raspbian is meant to be a stepping stone towards the real world of Linux (and hi-grade Unix) in true business production and commercial must not fail cannot fail environments - i.e. the real world... then surely Raspbian has to catch up and deploy with LVM2 as standard. It just one layer of abstraction. Pretty much any real install of linux uses a decent volume manager of some sort.

Ferroin commented 6 years ago

@sdo101 You come from an entreprise and/or Fedora background, don't you? I hate to burst your bubble, but there's a whole lot of use cases that don't need or even want volume management. Not counting Android (yes, it is Linux, it's just a different userspace than you're used to), you've still got IoT devices (it's literally just overhead there, and any overhead is significant when you're running on a dinky little Cortex-M4 powered by a small battery), more conventional purpose-built devices (streaming devices, digital signage, etc, all cases where you have no expectation of ever reprovisioning), containers (they shouldn't be touching block storage directly at all, let alone doing volume management), properly handled virtual machines (you should be doing the volume management on the host system, not in the guest, and definitely not both places)), industrial control systems (again, tight embedded, any overhead is bad, plus one extra potential point of failure), systems for classic technophobe users (who will get exactly zero practical benefit from it, just like they get exactly zero practical benefit from Windows Storage Spaces), and a whole slew of other cases beyond that.

Realistically, there are far more Linux systems that do not use volume management than do out there.

jeffmccune commented 5 years ago

You should reconsider using LVM by default. LVM is the better solution because it supports rollback through snapshots. Rollback is particularly useful when experimenting. If LVM is not the default, users will learn about an inferior solution to volume management.

rkarlsba commented 5 years ago

You should reconsider using LVM by default. LVM is the better solution because it supports rollback through snapshots. Rollback is particularly useful when experimenting. If LVM is not the default, users will learn about an inferior solution to volume management.

I fully agree. Using filesystem directly on partitions is old-school these days. LVM is well documented and has minimal overhead, so low it's harldy possible to measure. It is, as mentioned, a well-known solution that many distros rely on as the default (such as rhel and centos). Understanding LVM will only help beginners, as it offers a lot of possibilities not available if using filesystems directly on partitions or disks.