trailofbits / polyfile

A pure Python cleanroom implementation of libmagic, with instrumented parsing from Kaitai struct and an interactive hex viewer
Apache License 2.0
339 stars 22 forks source link

Polyfile crashes with NotImplementedError #3374

Open huettenhain opened 2 years ago

huettenhain commented 2 years ago

I am running PolyFile version 0.4.2 on Python 3.9.7 on Windows. I get unexpected exceptions of type NotImplementedError. Consider the following test:

    def test_polyfile(self):
        from polyfile.magic import MagicMatcher
        data = zlib.decompress(base64.b64decode(
            'eNptUsFO4zAQvVvyPwyHSnAgtpukpRJCKtBuJbqkanxZbRAy1C2BkqDYRbv79YydRGm7WLJlv3me9zzj'
            '3uJ2ei4CQYkADuXTKyWXl0AJMPn3QwO7UVZty40DFmqjDfSRtqTk6ooSXaz8BUr6R3fv8pWB3xA6Ljw4'
            '5KbcFRbEXuY63XGmsMt0SK2TFFYX1kBUmwA2HoMnAkuaDbApnGY4xKgfiMFF0I8DkWVWG5tl63yrz2rW'
            'LfrjSM6tN4hICuxHKcuJOzlT9YLiFWq27wa21KbcVc/ovVGeoqtOXLTb1rwLN0C6e7IecxHRgNfKaJ+C'
            'zfT2U9v8WfmIV++MHJYpOir4XBcb+wKC85pJibGVVu+UXEtKmBSPHIsv19hmdxUPEZIDzjkM4zAYDQcg'
            'kYwItLPCpp8mSbJIT+AXvhju5fwnzMbpDF6UgdedsTDX6k2vggDOQKKZifQeW+nO7p9KozSHGJduwCCO'
            'wxjWe6BAbR8q9sDhN6CIov/BKBx1ICW2Utjvqv1Ly7J0P7BpY5r/0xDV1TJWVbb2OBCI9XqTZPoFx5+0'
            'nw=='
        ))
        types = [next(iter(match.mimetypes)) for match in MagicMatcher.DEFAULT_INSTANCE.match(data)]
        self.assertIn('application/pdf', types)

In my setup, this crashes with the following exception:

Traceback (most recent call last):
  File "X:\test\test_polyfile.py", line 82, in test_polyfile
    types = [next(iter(match.mimetypes)) for match in MagicMatcher.DEFAULT_INSTANCE.match(data)]
  File "X:\test\test_polyfile.py", line 82, in <listcomp>
    types = [next(iter(match.mimetypes)) for match in MagicMatcher.DEFAULT_INSTANCE.match(data)]
  File "X:\venv\lib\site-packages\polyfile\magic.py", line 2184, in match
    if m and (not to_match.only_match_mime or any(t is not None for t in m.mimetypes)):
  File "X:\venv\lib\site-packages\polyfile\magic.py", line 2017, in __bool__
    return any(m for m in self.mimetypes) or any(e for e in self.extensions) or bool(self.message())
  File "X:\venv\lib\site-packages\polyfile\magic.py", line 2017, in <genexpr>
    return any(m for m in self.mimetypes) or any(e for e in self.extensions) or bool(self.message())
  File "X:\venv\lib\site-packages\polyfile\iterators.py", line 44, in __iter__
    yield self[i]
  File "X:\venv\lib\site-packages\polyfile\iterators.py", line 30, in __getitem__
    self._items.append(next(self._source_iter))
  File "X:\venv\lib\site-packages\polyfile\iterators.py", line 54, in unique
    for t in iterator:
  File "X:\venv\lib\site-packages\polyfile\magic.py", line 2005, in <genexpr>
    return LazyIterableSet((
  File "X:\venv\lib\site-packages\polyfile\magic.py", line 2047, in __iter__
    yield self[i]
  File "X:\venv\lib\site-packages\polyfile\magic.py", line 2031, in __getitem__
    result = next(self._result_iter)
  File "X:\venv\lib\site-packages\polyfile\magic.py", line 760, in _match
    m = self.test(context.data, absolute_offset, parent_match)
  File "X:\venv\lib\site-packages\polyfile\magic.py", line 1953, in test
    raise NotImplementedError(
NotImplementedError: TODO: Implement support for the DER test (e.g., using the Kaitai asn1_der.py parser)

From my limited understanding, I would expect that this exception should not be propagated to me; instead, I would expect that when a test raises an exception, it is silently discarded as having produced no match.

ESultanik commented 2 years ago

Do you get this error when you run PolyFile from the command line, or only when you try and invoke it programmatically?

huettenhain commented 2 years ago

I have only attempted to run it programmatically; this is also my use case.

ESultanik commented 2 years ago

This is related to an unimplemented feature of libmagic that PolyFile has not yet implemented. PolyFile internally circumvents this by disabling the one test related to it, but if you access the MagicMatcher via the API then it does not disable the test. We have started implementing this feature and it will be included in the next release. In the mean time, you can circumvent it using this workaround: https://github.com/trailofbits/polyfile/blob/1718dbb42d1ab87bca69ff7ea7a32feef619e462/tests/test_magic.py#L56