itoleck / WindowsPerformance

Various Windows Performance files, scripts, settings and documents
MIT License
30 stars 9 forks source link

Answer key for trace; SlowBoot2.etl #5

Open itoleck opened 2 years ago

itoleck commented 2 years ago

Put your charcoal face mask on, sit back and enjoy, this is an ugly Windows system boot up trace.

Scenario:

In this case we see what I hope you never have to experience, a 26+ minute boot up!

The boot process is slow during the entire trace period. The only boot phase that is within an acceptable time is the Pre Session Init at 4.48s. Open the Other panel -> Boot Phases graph and see that Session Init, Winlogon Init and Explorer Init are all slow.

Boot Phases

Whenever you see slowness during the entire boot process it is a good hypothesis that it is hardware related. A few hardware resources may be to blame; CPU, storage and memory. Let's dive into each resource to understand if it is the cause of the slowness.

Analysis:

CPU

The first thing to verify in a trace is the number of logical CPUs that the machine had when the trace was taken. Open the Menu -> Trace -> System Configuration. Then open the General tab. The Number of Processors is 8 and the Processor Speed is 3312MHz. This should be fine for most cases and should not cause a Windows machine to boot in 26+ minutes.

The CPU usage should be verified next to make sure that there was not a process/es using all of the CPU during the trace. You should notice right away that the CPU usage during the trace is low, very low in fact that the CPU is idling for > 99% of the time. This quickly rules out CPU resources as a bottleneck.

Memory

Verify the amount of RAM installed by Opening the Menu -> Trace -> System Configuration. Then open the General tab. The machine had 1004MB of RAM installed. This seems low for Windows 10 and if you check the system requirements for Windows 10 you will find that for 64 bit the requirement is 2GB of RAM. Could it be that simple? Let's verify how much memory the system was using during the boot.

Open the Memory panel - Memory utilization graph. Sort by the column Size (MB) Max descending. You will see that the Total Physical Memory was 1004MB and the Commit (Total) was 856.762MB which is less than the RAM. Even though the requirements were not met, the system did have a small amount of memory left and should not have caused such a long boot process.

Storage

Understanding the storage that the system had when it booted is the next step. Open the Menu -> Trace -> System Configuration. Then open the Storage tab. See that there were 2 disks with 1 partition each, formatted as C: and E:. These disks were virtualized disks and there is no way in the trace to understand what storage subsystem was used to store the virtual hard disks.

We can however check the performance of the storage. Open the Storage panel -> Disk Usage graph. Use the View Editor(gear icon) to add the following columns in order.

Priority | Disk | IO Type | Path Tree | GOLDENBAR | Process | Size (B) Sum | Disk Service Time (us) Avg | Global IO Time (us) Avg | IO Time (us) Avg | Global IO Time (us) Sum | IO Time (us) Sum | QD/I | Complete Time (s) | BLUEBAR | Disk Service Time (us) Sum

Disk Usage Columns

I like to select the Normal priority and right-click Filter To Selection to remove any Low and Very Low IO. These IOs which were introduced in Windows Vista are low priority and only serviced when there is spare time, only normal and high priority IO are important usually in troubleshooting disk performance.

If you look at the Disk Service Time (us) Avg for all Normal IO it shows that the average latency of the disks were 63.617284ms which is pretty bad. That is about the same as a very bad performing 5,400RPM or less HDD. If you then open the disk 0 and 1 to the IO Types you see that for disk 0 the writes are much worse than the reads; write average is 450.907036ms and reads 4.377482ms. The write latency is almost as bad as a floppy disk while reads are like SSD speed.

Disk 1 averages are super good, NVME SSD speeds, < 1ms average. Open the disk 1 and view the Path Tree information. You should see that E:\pagefile.sys is the highest used file.

Open the disk 0 Path Tree and see that all locations on the disk are very bad performing; this indicates there is an issue with the disk. You can calculate the average bandwidth by dividing the Size (B) by the (Disk Service Time (us) /1,000,000). In this case for the disk 0 Normal IO it would be 1,725,725,184 / 1,212.892997595 = 1,422,817.336254621 MBs. About 1.5 MB per second. That is almost 80x slower than a normal 7,200RPM HDD.

Remediation:

Do not use a cheap flash drive as the storage for your virtual machines.

MagicAndre1981 commented 2 years ago

Here is also an example where disk slows down boot a lot. After some years, the 5400rpm drives are getting extremely slow for random disk operations, but copying larger files is ok. so that they can still be used as external USB drives