Closed satyamtg closed 4 years ago
Why are we using magic for those files? Wouldn't it be faster to use the filenames? Also, magic is notoriously poor quality with text files (no magic number obviously!)
Why are we using magic for those files? Wouldn't it be faster to use the filenames? Also, magic is notoriously poor quality with text files (no magic number obviously!)
That's due to these lines -https://github.com/openzim/python_scraperlib/blob/master/src/zimscraperlib/zim/filesystem.py#L66-L70
I think what we shall use the filename based guess if text and not just text/plain is present in the magic mime.
Why are we using magic for those files? Wouldn't it be faster to use the filenames? Also, magic is notoriously poor quality with text files (no magic number obviously!)
That's due to these lines -https://github.com/openzim/python_scraperlib/blob/master/src/zimscraperlib/zim/filesystem.py#L66-L70
I think what we shall use the filename based guess if text and not just text/plain is present in the magic mime.
Exactly, we should use if self.mime_type.startswith("text/")
instead.
This file in the PHZH zim made during this run in the ZimFarm has a wrong mimetype and hence we get the following error message in the console -
The file shall be detected as
text/css
to work but rather gets the mimetype astext/troff
. This is basically due to the magic output where we get the following for this file -