Closed mwkaufman closed 3 years ago
Currently we publish a lot of duplicate revisions, we could check the hash of the data we're preparing to upload and if it matches the hash of the most current dataset, we could skip creation of this new duplicate revision.
@AyushSVarma https://stackoverflow.com/questions/1775816/how-to-get-the-md5sum-of-a-file-on-amazons-s3
We want to improve the quality of the datasets - can feed into the dashboard
Currently we publish a lot of duplicate revisions, we could check the hash of the data we're preparing to upload and if it matches the hash of the most current dataset, we could skip creation of this new duplicate revision.