vatesfr / xen-orchestra

The global orchestration solution to manage and backup XCP-ng and XenServer.
https://xen-orchestra.com
Other
795 stars 265 forks source link

Show allocated space on SR #3915

Open etlweather opened 5 years ago

etlweather commented 5 years ago

I think it would be a nice addition in the SR view to not only show how much disk space is used on the SR but also how much as been allocated, particularly for thin-provisioning.

Say you have an SR with 200GB of space. You have 3 virtual disks of 50GB each, and each are filled with 10GB of files.

Currently, from my understanding, the graphic will show 30GB used (3x10GB). And it will show that you have 170GB of free space.

And while that is true, you may run into trouble if you create, say, a new virtual disk of 100GB as you over subscribed.

It seems like it would be possible to show the allocated disk size versus how much they use currently.

Snapshots also factor into this so should be shown. I guess the size of the virtual disk would be shown against the snapshot, not the base-copy (at least it's my understanding that only the currently live snapshot will expand, the base-copy won't change until there is a merge operation after a deletion of a snapshot).

I don't know the feasibility of all this. But if it was feasible, it would make managing a thin-provisioned SR much easier. Particularly for young Sys Admins who can be a bit too eager in creating more VMs when they see they still have disk space "left".

olivierlambert commented 5 years ago

It's not a simple thing. I mean, in your example, there is no problem at all to use a new 100GiB disk. In a thin pro scenario, having the sum of all virtual disk size larger than the SR isn't a problem at all.

The only problem is when you start to use (for real) near to 100%. And when you'll reach 100%, no more writes will be possible. That's why we implemented alerts in a plugin, to help the admin to be… alerted if a SR is reaching more than a certain level.

So in the end, does showing the "allocated" space relevant/useful in a thin case? I don't know, I'm not really sure, but it's just my point of view :)

etlweather commented 5 years ago

I can see your point.

My concept is for planning. I can't see right now how much disk space I promised to the VMs. Sure if they aren't using most of it, the promise does not matter ... but the users of these VMs all "know" they can use the space and if one day they all decide to use it...

In some cases, I can totally see that it's not a problem, you may be well in control of the disk usage and you know you allocated more than you have and you know that's fine and an alert at "X% free" is sufficient.

But in other scenario, where you aren't totally in control and maybe you want to make sure you never allocate more, being able to visualize it would be great. Right now, I have to pull up the calculator and go through the disks, to get an idea of how much I allocated so I can make a decision as to whether it is a good idea to put more disks on the storage or not.

olivierlambert commented 5 years ago

So a kind of "total allocated space" number in the SR general info view?

etlweather commented 5 years ago

Yeah. It would be nice as a "bar graph" like you have the disk usage, maybe right above/below.

olivierlambert commented 5 years ago

The problem with a bar graph like this, allocated space will likely be more than 100%, so how to draw this :wink: A number would be actually easier I think. And makes more sense (eg on a 2TiB thin SR, having "2,5TiB allocated (150%)".

etlweather commented 5 years ago

Good point... I guess in my case where I'm trying to not over subscribe, it would work but I can see a bar graph not working well for 115% used!

A number would work just as well, particularly with a percent too.

th0ma7 commented 5 years ago

I was about to create a similar bug on this! XenCenter does provide the total allocated storage. So in some cases you end-up over-allocating storage using thin-provisioning which is fine in therory although it can be a major issue if real free space is limited. One thing we discovered is that in some cases coalescence jobs will not be able to complete due to the overallocation of storage using thin-provisionning. This in turn end-up loosing a VM while doing a migration (corrupted drive).

So all in all what's missing in the SR view are:

olivierlambert commented 5 years ago
  1. First, there's no way to know if there is enough space to coalesce, and then I'm not sure this has anything to do with the allocated space.
  2. We can always put a warning/info on sum of allocated space, yes
  3. About UI changes, please provide a wireframe/draft example, I'm having hard time to understand what you meant
  4. Units: making them configurable because you are confused while mixing with another tool is probably not enough to convince us but we're open ;)
th0ma7 commented 5 years ago

SR-info-export-mode Here is a really rough idea of what that could end-up being. Exploding with arrows the info of all VM VDI from it would be really helpful.

Also, for you point 1) I tend to disagree. Simply said, I have 300GiB free, and a 2TiB VDI + a few snapshots associated to this VDI. There is now way coalescence can run as it will require at least 2TiB of free space + the associated VDI it has to merge. We end-up with a hidden problem that is invisible unless having more info related to SRs and it's real usages.

olivierlambert commented 5 years ago

That's not how coalescing works on thin pro. It's impossible to predict a coalesce outcome on base copy file size, it all depends on the merge result (which is not known before). Eg likely those "to merge" blocks will replace existing ones or not: the outcome will be very different. I'm not aware on any API we could request to know the space needed for coalescing. There's absolutely no API on coalesce process whatsoever.

Regarding the UI, how does it work? On hover? for how long? In a web app it's really not trivial to do arrows like this without using very custom styles/components :/

th0ma7 commented 5 years ago

We did encountered issues where coalescing couldn't work due to lack of space. Not too sure how xen seems to be considering real free space VS total allocated space (considering thin) and potential lack of real space? But somehow in the equation this resulted in failure to complete coalescence job on that VM... Magic is unknown to me but end-result was: VM corruption post migration. Potential Xen bug? VM that was already in a bad shape for other reason? unknown.

That said, having a graphical representation that helps understanding the exact of an SR (such as the quick example I did), clearly showing total allocated space vs real used and free would more than welcomed :)

Color coding running, stopped, orphaned (or unmanaged) would also be great.

Jarli01 commented 5 years ago

I think it might make the most sense to simply display the total SR allocation, using the above screenshot (I snagged it and made a few changes.)

If you have a 4TiB SR (thin provisioned) and you've allocated 4.39TiB on that SR, your SR should simply show like this.

If you have free space on the SR (aren't over provisioned) the free space shows up at the end as clear space (maybe color it if needed). If you are over provisioned by any amount that amount is displayed on the SR in red.

58261275-f8174200-7d45-11e9-97b1-6fc034b543e4

Jarli01 commented 5 years ago

I don't believe the SR needs to have a graphical display of the Running, stopped orphaned details for that SR, I get the value add, but the most important visual representation is to simply determine

If I put this VM on here, how much space is here, and how over-provisioned is this SR (if at all).

olivierlambert commented 5 years ago

This alternative might be interesting but I'd like more opinions on that, especially on UI side. Pinging @pdonias

pdonias commented 5 years ago

Hi everyone, thanks for all the feedback! I'm going to try and go through it :)

Let me know if you think I misunderstood something or if I missed your point and thanks again for the much appreciated feedback.

th0ma7 commented 5 years ago

Hi all,

  1. insufficient space for coalescence to complete: indeed perhaps technically not obvious to find. Although SR output logs showing that coalescence didn't complete could be an initial pointer that a problem is underlying, thus extracting that info could be really helpful.
  2. agreed different usages being represented all at once could become confusing. Perhaps having a "toggle" switch to active one or another in order to analyse the situation would b best? In the options on/off vm could be of interest as well: if 50% of the vm are off, perhaps there is an issue? if there are tons of orphaned vdi, perhaps someone is doing a wrong manipulation and it needs to be revised, etc.
  3. overprovisionned: the mandatory for the whoe SR are: real used, real free, total allocated and if overprovisionned, by how much (simple math here). How to graphically represent that is another story: somehow showing that the bar is "virtually" longer than real size (i used hash in my representation, @Jarli01 used red, was suggesting a dotted line ... options are multiple
  4. Arrows pointing to VM: indeed it may end-up being way too crowded. Perhaps click to make the arrow to appear? In some situation there are really small VMs that become hard to mouse over on top of: how to address these? perhaps having an "auto-zoom" to enlarge that section when mouse-over? Just to name a few ideas.
  5. storage units: agreed, the most obvious are being used. But, xencenter uses one thing, and when comparing free to real used to my SANs data to xencenter... well, we have to convert units and double-check as units differs all-over. Having the ability to select the units like : most convenient (current), or specif GB or GiB or etc.. would be helpful when doing analysis against multiple systems
th0ma7 commented 5 years ago

and now, the big point, showing the allocated space: a. xencenter provides me with that info, xoa doesn't and is the point of this bug entry in the first place. b. indeed, finding a simple graphical way to show that would be awesome (to me, e.g. essential for my operations) c. showing allocated space for each disks: again, perhaps having a sort of mouse-over zooming capability and/or point/click with bubble info where you actually see that, duno? d. showing overprovisionned space: no need for it to be red, i agree, this is not an isse in itself. My SANs actually make uses of that to tell you that you are saving storage because you are exactly in that situation and provides you with % savings you're currently doing. I do not agree that the bar must only show the size of the SR. But i do agree that providing too much info at a glance can become confusing. Although overprovisionned space (with total allocated) are equally important for doing analysis and projecting in the future. e. showing the allocated size as a number is the bare minimum, and indeed, small extra is to highlight that allocated is higher than total SR size.

Totally outside of the scope of this bug entry but somewhat related:

pdonias commented 5 years ago
  1. That would be interesting and probably doable but it's starting to be unrelated to this issue so feel free to open a new one explaining what to look for and what behaviour you'd expect
  2. I like the toggle idea better, even though it would add an element (the switch) to the UI. Anyway, showing the allocated size in a whole new graph seems more reasonable as it's a very different information from the physical utilisation. Then, I don't understand the part with on/off VMs. What does it change for the SR if the VM is ON or OFF? Agreed about the orphaned VDIs.
  3. We could show the allocated sizes in the new graph, then
  4. click to make the arrow to appear

    So you're trading a hover for a click?

    there are really small VMs that become hard to mouse over on top of

    That's because the point of this graph is to get a general view of your SR. What about the "Others" part where you don't get the information at all? Don't you mind it?

  5. OK, I get your point but 1) I'm curious to know what XenCenter uses, 2) we can't know what all the other softwares use and 3) if we add the possibility to "force" the unit, how do you imagine it? Would it affect all XO? Would it persist through a reload? Would it be a user preference? It's a bit weird of a feature so I don't really know what you'd expect

and now, the big point:

a/b. I'm not saying we shouldn't show the allocated size, just that mixing allocation and utilisation doesn't seem easily doable to me

c. We could indeed show the allocated size in the tooltip. I don't know how well it would fit in the new version of the graph we're implementing but we'll keep that in mind

d. What do you mean by "space you're saving"? Do you consider the overprovisioned space as saved?

I do not agree that the bar must only show the size of the SR

Maybe you don't because you'd already know how to read the graph but the goal is to not have to learn how to read it to understand it. Which means no legend, a simple and clear graph and the same behaviour for thick and thin-provisioned SRs.

overprovisionned space (with total allocated) are equally important for doing analysis and projecting in the future

That's some useful feedback, thanks :)

e. Showing it as a number is trivial so we might do that as a first step and then think of a new "allocation graph" that would show up for thin-provisioned SRs only.

Outside the scope: if you have precise ideas of what you'd want, I'll let you open other issues but I'll avoid discussing them here to try and keep it focused on one subject.

th0ma7 commented 5 years ago

Hi @pdonias

  1. yes, the toggle idea could be sort of extended to other VM "state" such as which ones are currently ON or OFF, or even VDI associated to VM tags for instance.
  2. hover or a click, well, perhaps both, the click leaving them printed as needed leading to capability of clicking a few VDI of interest to track things up for instance.
  3. xencenter only print the total allocated info. quite basic, using again another unit ;) Unit related... I just don't know. I guess a global config thing that can be set? Or the ability to click on the value and force-select the unit temporarily for that specific data in the current view... Or even mouse-over and seeing the value in different unit (my SAN uses blocks, xencenter GB or MB and XOA GiB or TiB... sigh, it's just a pain.)

d. space saving.. yeah, that's a vendor type sales-pitch info more than anything. You've allocated a 2TB drive to a VM but the VM only uses 100GB of it... hey, you saved 1.9TB (e.g. 95% of storage savings). The value of that is between really interesting and somewhat limited and more aimed at selling things... As you are selling your product, that might be an argument to use of showing how much storage a client is saving ;)

e. baby-steps is probably best indeed :) I clearly have more ideas than you time to implement them lol.

thnx!

vit-zikmund commented 3 years ago

Hey there folks, couple years later and we're still at the same spot. I hope this didn't get overengineered to oblivion :) @th0ma7 makes a couple good points, but let's focus on the baby steps, right? I just managed to grind my VMs to a halt because I thought (silly me) that what I'm seeing in the bar graph is the virtual-allocation :facepalm: My bad, but I wanna help prevent this from happening.

While overprovisioning might be a good selling point, it's nothing that should let a sysadmin sleep well at night. I think one needs to make an educated decision (and setup proper alerting) before willingly stepping into the waters of space savings by overprovisioning :slightly_smiling_face: My own reason for using thin provisioning is faster migration, I really don't care about the space saving, for instance. But I digress...

Since @olivierlambert suggested this feature might land in XO 6, I'd like to bring my take to the picture, additive, sorted by preference:

  1. An "allocated" string along with the value of SR's virtual-allocation perhaps in the parentheses here.
  2. An optional "overprovisioned by" string followed with virtual-allocation - physical-size, when that value >0.
  3. a separate bar graph that would look pretty much as the one's that's there, but would show each VDIs virtual-size. To make the bars the same length, I'd make its 100% the max of SR's virtual-allocation and physical-size. When overprovisioned, a bar would mark the physical-size and everything after it would be marked as a kind of warning, yellow-ish overlay perhaps?

Thanks for your consideration!

PS: Terms in code-format follow the parameter names from xe sr-list and xe vdi-list.

olivierlambert commented 3 years ago

@marcungeschikts do we have a card or something to remember to do this in XO 6? I didn't find an XO 6 milestone anymore here.

marcungeschikts commented 3 years ago

@marcungeschikts do we have a card or something to remember to do this in XO 6? I didn't find an XO 6 milestone anymore here.

yes, we do: Wekan#220 (internal only) I added this issue in the Milestone v6