iipc / webarchive-commons

Common web archive utility code.
Apache License 2.0
50 stars 71 forks source link

Non-ascii mimetypes #60

Open ghost opened 8 years ago

ghost commented 8 years ago

I'm finding Korean sites that use Korean script mimetypes. I'm sure this is invalid according to one of the many mimetype RFCs, but it doesn't seem a good reason to abort processing of an otherwise OK record (ARCMetaDataParser.parse()). Seems reasonable to either accept it or replace a non-ASCII mimetype with DEFAULT_MIME instead.