iipc / webarchive-commons

Common web archive utility code.
Apache License 2.0
50 stars 71 forks source link

Use CharsetDetector to guess encoding of HTML documents #68

Closed sebastian-nagel closed 7 years ago

sebastian-nagel commented 7 years ago

See commoncrawl/ia-web-commons#4