Open eponerine opened 6 days ago
Yes, spreading the gold image to the
Maybe over-summarizing it, there are three buckets of blocks in those VHDX.
The first set I'd say - that could make all sorts of sense. It would accelerate actual deduplication (new in 2025) via the copyfile cloning. VM Fleet does not currently control for this, which is a gap that may need addressing.
The second set is harder. There's a major chunk of this already, depending on how "old" your gold image is. If it was only run a short time while setting the admin password in the actual "gold" from-the-build image, there's actually quite a lot of early lifetime work that happens including initial JIT of .NET assemblies which may not have happened yet. My own personal gold images suffer a bit from this, and I need to be careful to let that JITting start/complete before I do anything with my fleets. Its also costly CPU-wise (esp. on 1 VCPU VMs). Beyond that is the normal activity of the OS services we try to pass off as "small" but is very real. Starting in cloned extents, the overhead of reallocating those would add on top to possibly become noticeable early on.
The third set is quite tricky. Measure-FleetCoreWorkload internally takes care of seasoning the load files used in the VM but "core" VM Fleet leaves this to the analyst's control. If we copy'd the VHDX all the blocks are cloned at time zero and the overhead of splitting - on write - would confound early analysis of system behavior. It could be exactly what we want in some cases! But it would be there. On read, yes, the sharing could focus load in strange ways until the writes cause things to split (hopefully your loads have writes, and run in a consistent order ... all this kind of complexity would factor in).
Copying (volume to volume, i.e. collect to
Now, admittedly - those errors could be mitigated by a specific time zero process of seasoning the initial load files in the VMs; i.e., light all the VMs up and rewrite the load files 2x with random data. This might even be a good idea. We'd still have the second set of blocks (OS live workingset) but that might just be acceptable. And a specific seasoning capability would be good methodology to lay in.
Advocate to your favorite MVP PM :-) But for now, I'd roughly assume that the copy time might only be somewhat greater than the time it would take to reseason the fleet, and we don't have to think a lot about #1/2 above.
TL;DR - Are there performance ramifications for leveraging ReFS Block Cloning for the VM/VHDX deployments? Will those reads all referencing the same blocks skew results in a negative way?
=====
For the sake of keeping my question simple, let's assume Single Node S2D with a single workload CSV.
If you place your
GOLD.VHDX
on theC:\ClusterStorage\collect
volume, it takes an eternity to spawn and deploy all your VMFleet VMs, especially if your VHDX is like 100+ GB. It has to copyn
number of times toC:\ClusterStorage\nodeName
However, if you place your
GOLD.VHDX
onC:\ClusterStorage\nodeName
, the CSV you're ultimately testing, those VMs will deploy exponentially faster due to the magic of ReFS block cloning.I feel like this is a bad idea though, because the blocks essentially only exist a single time, with pointers back to the legit physical disks. The
n
VMs then are potentially not "spreading the read load" out to more physical disks?New-Fleet
that will first copy theGOLD.VHDX
"locally" to each workload CSV and then reference that copy as each VM's disk source? Because that will dramatically improve VM deployment time.