Closed workflowsguy closed 5 years ago
thanks for this report workflowsguy, an interesting bug! I'll look into it
I've found the offending code: https://github.com/richardlehane/siegfried/blob/master/internal/namematcher/namematcher.go#L149
The issue is that some filenames are within URLs (because of WARC scanning) and where sf thinks the name is a URL it strips characters following a "?" because in a URL that's the query string. E.g. it is trying to get the name within a string like "http://www.mysite.com/file.pdf?user=richard"
But in your case where the ? is legitimately part of a regular file name, this is breaking extension matching.
I'll have a think about how to re-jig this bit of the code to fix
When files are processed with
sf
, those that contain a question mark at the end of the filename will be identified with the correct type, but a "extension mismatch" warning will still be output, viz.I am running on macOS, where
?
is an allowed character for filenames.Thanks!