rndmendes / SyncManager

PowerShell Module to Synchronize files and folders
MIT License
0 stars 1 forks source link

Feature idea: non-cryptographic hash (for performance) #1

Open LM1LC3N7 opened 4 years ago

LM1LC3N7 commented 4 years ago

Hello,

I have also created a script to sync files and folders because the previous project I was using is not maintain anymore. But your script seems good, I will give it a try 😀

I would like to discuss here about how to speed up your (and also my) script.

I found that it exist non-cryptographic hash functions that can be used to compare data up to 10x faster than md5.

For exemple there is murmur3: https://github.com/PeterScott/murmur3

Sadly, there is no non-cryptographic hash functions in powershell, but maybe a portable executable file could be used as a dependency.

What do you think about it?

Regards,

Louis

rndmendes commented 4 years ago

Hi Louis,

Yes, in fact, I confess that I never tried my project with huge structures. I tried with small and medium structures and it worked just fine with acceptable speed. Nevertheless, I will do some testings with Measure-Object and introduce the new "-parallel" option in the Foreach-Object cmdlet. I think it will speedup a little bit. Regarding the hash, as you may have seen I'm using the default cryptographic hash functions supported by powershell. In fact, SHA1 and MD5 are deprecated and not advisable for use according to Powershell team members and SHA256, SHA384 and SHA512 are now the preferred.

Surely, a non-cryptographic hash would speedup things performance wise. I'll give it a try to see what I can come up with.

Regards, Ricardo

LM1LC3N7 commented 4 years ago

In fact, MD5 and SHA1 are depreciated for security purpose (for hashing passwords for exemple). But in your script, hashs are used only for file comparison, this is why non cryptographic hash are an option here,

On my side, I have a lot of files (medium and small), so MD5 is to slow and need to much performance. It was my solution too, as it is the only "fast way" to hash included into powershell.

LM1LC3N7 commented 4 years ago

I just tried your script to a lot of files and folders and it was quite fast (faster than mine). Good job though!

Of course, non-cryptographic hash would be even faster. 😄

rndmendes commented 4 years ago

Hey Louis! Good that it worked well!! Meanwhile, I had to tweak it a little bit because previously I didn't take into account that the file hash doesn't change when one changes the file name, but only when one changes the content of the file... :) So, I'm now doing some changes to take that "new" factor into account. Let's hope for the best!! :)