itoleck / WindowsPerformance

Various Windows Performance files, scripts, settings and documents
MIT License
30 stars 9 forks source link

Answer key for trace; DiskIOHigh.etl #2

Open itoleck opened 2 years ago

itoleck commented 2 years ago

Scenario:

Windows Server Backup was running, backing up spanned volume SSD disks 2, 3, 4, 5 to HD disk 6.

Analysis:

  1. Find current disk usage. Open trace in WPA and add Disk Usage graph.

  2. Order columns similar to the following: Disk | Priority | IO Type | Process | IO Init Stack |GOLDBAR| Disk Service Time(us) Avg | Size | Count |BLUEBAR| Disk Service Time(us) Sum

  3. Sort by Disk Service Time(us) Avg column. This will show the average latency for each disk. Disks 1-5 are within normal latency levels (1.294ms and less), but disk 6 shows an average of 108.9ms which is above the normal 15-25ms for a 7200RPM hard drive.

DiskIOHigh1

  1. Sometimes long Disk Service Time/High latency is caused by the IO being of a low priority. In this trace, only disk 0 has other priority IO than Normal. You can move the Priority column to the first slot and select the Low and Very Low IO priorities and right-click and select Filter Out Selection to remove them from the view. Move the Disk column back to the first after filtering.

DiskIOHigh2

  1. Open Disk 6, priority Normal and view the type of IO. In this case most of the IO are writes. To find out what process is writing open the Process column. You should find that wbengine.exe (10856) process is responsible for most of the writes.

  2. Open the IO Init Stack columns until the end for the wbengine.exe process and find that there are 69 writes of exactly 32MB each. Most writes have high latency (> 100ms).

DiskIOHigh3

Remediation:

Backup to faster backup storage or split up backups to different hard drives to spread the load.