Check and don't publish duplicate revisions

rearc-data / datasets-aws-data-exchange

List of data products published on AWS Data Exchange

0 stars 0 forks source link

Check and don't publish duplicate revisions #1

Closed mwkaufman closed 3 years ago

mwkaufman commented 4 years ago

Currently we publish a lot of duplicate revisions, we could check the hash of the data we're preparing to upload and if it matches the hash of the most current dataset, we could skip creation of this new duplicate revision.

mwkaufman commented 4 years ago

@AyushSVarma https://stackoverflow.com/questions/1775816/how-to-get-the-md5sum-of-a-file-on-amazons-s3

mwkaufman commented 4 years ago

We want to improve the quality of the datasets - can feed into the dashboard