microsoft / azure-vhd-utils

Azure VHD utilities.
Other
96 stars 41 forks source link

MD5 checksum computing taking an unexpectedly long time #15

Open colemickens opened 8 years ago

colemickens commented 8 years ago

I have created a VHD that is ~500GB but it's less than 1GB on disk currently.

$ du -hs disk.vhd
943M disk.vhd

$ azure-vhd-utils-for-go inspect footer --path disk.vhd
[...]
PhysicalSize      : 536879692800 bytes
VirtualSize       : 536879692800 bytes
[...]

This is what I'm seeing right now as it's computing the MD5 Checksum...

Computing MD5 Checksum..
Completed:  10% RemainingTime: 00h:14m:34s Throughput: 4197 MB/sec
536879MB / 4197MB/s = ~127 s

I'm not really sure what's going on with this. Is the throughput value wrong? Is my math wrong?

anuchandy commented 8 years ago

Looks like the throughput calculation is wrong.

The MD5 calculation is usually slow as we can have only one goroutine to read the vhd file. Also we cannot write to MD5 writer ("crypto/md5") from multiple routine.

In case of upload, we use multiple routines for uploading the pages though there is only one routine reading the vhd file.