smartbugs / smartbugs-wild

This repository contains 47,398 smart contracts extracted from the Ethereum network.
Apache License 2.0
153 stars 39 forks source link

How were duplicates found out? #6

Closed abhinandanudupa closed 1 year ago

abhinandanudupa commented 1 year ago

Hello! I an interested in a large scale analysis as you have done. Can you tell me about the tools or scripts that you have used to figure out if the source code is a duplicate of some other contract?

tdurieux commented 1 year ago

Hello @abhinandanudupa,

I do not find the code that I used to do that but if I correctly remember I remove all spaces (and maybe comments) from the solidity file and then use md5 hash algo to identify duplicates.

abhinandanudupa commented 1 year ago

It's alright, I am also using a similar strategy but I don't remove the spaces and the comments. The strategy has resulted in a reduction of approximately 75%. Maybe I should also try removing the comments and spaces.

Thank you!