Open rluetzner opened 7 months ago
Regexes make my brain hurt. However, I've figured out at least a few things.
/^\s*(?:\d\.\s+)?File extension(?:\(s\)|s|)\s?:\s+(?:\*\.|\.|)([0-9a-z_-]+)\s*(?:\(|$)/im
, which was replaced in commit be9ca41d sometime in 2018.I've played around a bit with a regex tester and was able to fix some of these things. Making the quotes optional in particular is quite easy. But I'm very uncertain as to how this will affect a full rebuild. There doesn't seem to be a clear scheme to the IANA MIME type declarations, so I don't think there's a way to handle all cases anyway.
For what it's worth, here's the regex I came up with that works with and without quoted file extensions:
/^\s*(?:\d\.\s+)?File extension(?:\(s\)|s|)\s?:[\s]*['"]?(?:\.?([0-9a-z_-]+))['"]?$/im
I used https://regex101.com/ to test things and copied the declaration for atom+xml and modified it manually.
This does not work properly with multiple comma separated file extensions, but none of the other regexes do, so I'd count it as an improvement.
The two MIME types have clearly defined file extensions.
https://www.iana.org/assignments/media-types/application/vnd.comicbook+zip https://www.iana.org/assignments/media-types/application/vnd.comicbook-rar
However, I compared this with a few entries that do have extensions listed in
src/iana-types.json
and as opposed to the ones I looked at, these two MIME type definitions have their file extensions in a numbered list, e.g.(excerpt form vnd.comicbook+zip). I guess the parsing logic needs to be adjusted to match these, but I'm not good enough with JS to do that myself.