internetarchive / warcprox

WARC writing MITM HTTP/S proxy
379 stars 54 forks source link

Require dedup-bucket in Warcprox-Meta to perform dedup #90

Closed vbanos closed 6 years ago

vbanos commented 6 years ago

Rename Warcprox-Meta key captures-bucket todedup-bucket.

Update unit tests.

Require dedup-bucket to perform dedup.

DedupableMixin.should_dedup is very convinient to implement this feature. Only a minimum code change is required.

nlevitt commented 6 years ago

Requiring dedup-bucket should be optional...

vbanos commented 6 years ago

Oh I got a bit confused. It is ready now, I've made it optional with hidden CLI option --dedup-only-with-bucket.