Open larytet opened 7 years ago
In Linux there is https://linux.die.net/man/8/debugfs I can read the drive sector by sector, map sectors to files, feed SHAs machines and, eventually, get an SHA for every file on the disk without doing open-read-close. Or so it appears. What do I miss?
What about Windows?
My guess is that you will be I/O bound. This is a WAG, however, and not based on any information specific to your system. I also believe, but also don't have any evidence to support, that you will spend more time writing and debugging a system to read sector by sector and then reconstructing files, than you would take just reading the files the regular way.
The goal is to run on 100s of 1000s machines and VMs. In my case the performance is critical, development efforts are not.
If you have the time, you're welcome to go for it. Please let me know how it goes!
@jessek I also have to hash whole drives a lot on Linux, like 3,7 Tb x3 drives full of mixed types of data... Which takes a really long time with md5.
How about implementing some super-fast algorithm, like xxHash for such goal of purely checking for data integrity?
My goal is to hash all files on a HDD in 0.5-1T range. What is my bottle neck going to be - CPU or I/O? Does it make sense to try to read and hash the physical sectors on the hard disk, and map the hashes to the files in the end of the process using tool like debugfs?
If my drive is a high end SSD - does it change the equation?
Thanks