michelcrypt4d4mus / pdfalyzer

Analyze PDFs. With colors. And Yara.
GNU General Public License v3.0
244 stars 18 forks source link

Yaralyze Fails with "Internal Error: 46" #15

Open EightBitBoot opened 2 months ago

EightBitBoot commented 2 months ago

I'm trying to analyze a potentially malicious PDF file and consistently get the error Internal Error: 46. According to yara's error.h (printed alongside the error), error 46 corresponds to ERROR_TOO_MANY_RE_FIBERS. All other modules are working fine.

Pdfalyzer was installed with pipx (pipx install pdfalyzer). The result of pipx runpip pdfalyzer freeze is:

anytree==2.12.1
chardet==5.2.0
commonmark==0.9.1
Deprecated==1.2.14
pdfalyzer==1.14.10
Pygments==2.18.0
PyPDF2==2.12.1
python-dotenv==0.21.1
rich==12.6.0
rich_argparse_plus==0.3.1.4
six==1.16.0
wrapt==1.16.0
yara-python==4.5.1
yaralyzer==0.9.4

Please let me know if there's any other debugging info I can provide (aside from the PDF as I don't want to upload anything potentially malicious).

michelcrypt4d4mus commented 2 months ago

unfortunately that is actually an error I've seen before (see my note in CHANGELOG.md) and I don't know what can be done about it because it's a yara internal error. you can see where it is raised in the yara source code and you might be able to trace back something useful from there (the value that trips the error is here). you can also see that there was a change in yara 3.11 that supposed to limit these errors but IIRC that's around when i began seeing them.

there are two command line options that are passed through to the yara engine. when i was confronted with this issue i did a (very little) bit of fiddling in the hopes that might help but gave up quickly. they are --max-match-length and --yara-stack-size. unfortunately i have no suggestions as to what values might help other than "lower is probably better".

some questions though:

  1. what command are you running exactly to trip this error?
  2. what operating system are you using?
  3. have you tried running with custom YARA rules instead of the ones that are pacakged with the pdfalzyer?

re: 2 it might be possible to incrementally delete the yara rules that come packaged with pdfalyzer from your local installation in an attempt to isolate which rule is causing the error (or whether YARA just always fails on that file regardless of rule).

if you don't know where to find the packaged YARA rules in your local installation of the pdfalyzer try running which pdfalyze. that will probably show you a dir that ends in something like pdfalyze-[stuff]/bin/pdfalyze. the rules files will be in a pdfalyzer/yara_rules/ dir somewhere in the sub-hierarchy (the folder hierarchy will look exactly like this repo's pdfalyzer folder)

edit: fixed link to CHANGELOG.md edit 2: added note about where to find YARA rules

michelcrypt4d4mus commented 2 months ago

re: 1 the best option is probably to run pdfalyze without the -y option which is enabled if you specify no options ("Choosing nothing is choosing everything except --streams.")

Screenshot 2024-08-20 at 6 15 13 PM
michelcrypt4d4mus commented 2 months ago

one other thing i would say is that if you do manage to isolate a yara rule + file combination that trips the error it might be worth filing a bug in the official yara repo

michelcrypt4d4mus commented 2 months ago

(and it's definitely worth telling me what the rule is so i can at least temporarily remove it from the pdfalyzer)

michelcrypt4d4mus commented 2 months ago

I just released version 1.15.1. It has a new command line option --no-default-yara-rules. If you use --no-default-yara-rules in tandem with one or more --yara-file options the scan will be done with only your custom YARA rules file (specified by one or more --yara-file options). Before this change specifying --yara-file just appended the specified custom --yara-file options to the set of prepackaged YARA rules files so there was no way to run a YARA scan without using the default rules. Now you can use only your own custom YARA rules file(s).

Theoretically this should make it much easier to debug your issue because you can select a limited set of the preconfigured rules (in this directory in the repo and copy them out to your own custom file (which you then pass to the --yara-file argument) to test with. No which pdfalyze / manual editing of the files installed by pip kind of shenanigans required any more.

If we're lucky there's just some bad rule in the pre-configured set that is causing this issue on macOS.

lokman2k5 commented 1 month ago

I am getting this error and cannot use pdfalyzer at all

lokman2k5 commented 1 month ago

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /home/lyzen/.local/share/pipx/venvs/pdfalyzer/lib64/python3.12/site-packages/pdfalyzer/output/pd │ │ falyzer_presenter.py:132 in print_yara_results │ │ │ │ 129 │ │ YaralyzerConfig.args.standalone_mode = True # TODO: using 'standalone mode' lik │ │ 130 │ │ │ │ 131 │ │ try: │ │ ❱ 132 │ │ │ self.yaralyzer.yaralyze() │ │ 133 │ │ except yara.Error as e: │ │ 134 │ │ │ console.print_exception() │ │ 135 │ │ │ print_fatal_error_panel("Internal YARA error! YARA's error codes can be chec │ │ │ │ /home/lyzen/.local/share/pipx/venvs/pdfalyzer/lib64/python3.12/site-packages/yaralyzer/yaralyzer │ │ .py:149 in yaralyze │ │ │ │ 146 │ │ │ 147 │ def yaralyze(self) -> None: │ │ 148 │ │ """Use YARA to find matches and then force decode them""" │ │ ❱ 149 │ │ console.print(self) │ │ 150 │ │ │ 151 │ def match_iterator(self) -> Iterator[Tuple[BytesMatch, BytesDecoder]]: │ │ 152 │ │ """Iterator version of yaralyze. Yields match and decode data tuple back to call │ │ │ │ /home/lyzen/.local/share/pipx/venvs/pdfalyzer/lib64/python3.12/site-packages/rich/console.py:169 │ │ 4 in print │ │ │ │ 1691 │ │ │ render = self.render │ │ 1692 │ │ │ if style is None: │ │ 1693 │ │ │ │ for renderable in renderables: │ │ ❱ 1694 │ │ │ │ │ extend(render(renderable, render_options)) │ │ 1695 │ │ │ else: │ │ 1696 │ │ │ │ for renderable in renderables: │ │ 1697 │ │ │ │ │ extend( │ │ │ │ /home/lyzen/.local/share/pipx/venvs/pdfalyzer/lib64/python3.12/site-packages/rich/console.py:132 │ │ 6 in render │ │ │ │ 1323 │ │ │ ) │ │ 1324 │ │ _Segment = Segment │ │ 1325 │ │ _options = _options.reset_height() │ │ ❱ 1326 │ │ for render_output in iter_render: │ │ 1327 │ │ │ if isinstance(render_output, _Segment): │ │ 1328 │ │ │ │ yield render_output │ │ 1329 │ │ │ else: │ │ │ │ /home/lyzen/.local/share/pipx/venvs/pdfalyzer/lib64/python3.12/site-packages/yaralyzer/yaralyzer │ │ .py:209 in rich_console │ │ │ │ 206 │ │ """Does the stuff. TODO: not the best place to put the core logic""" │ │ 207 │ │ yield bytes_hashes_table(self.bytes, self.scannable_label) │ │ 208 │ │ │ │ ❱ 209 │ │ for _bytes_match, bytes_decoder in self.match_iterator(): │ │ 210 │ │ │ for attempt in bytes_decoder.rich_console(_console, options): │ │ 211 │ │ │ │ yield attempt │ │ 212 │ │ │ │ /home/lyzen/.local/share/pipx/venvs/pdfalyzer/lib64/python3.12/site-packages/yaralyzer/yaralyzer │ │ .py:153 in match_iterator │ │ │ │ 150 │ │ │ 151 │ def match_iterator(self) -> Iterator[Tuple[BytesMatch, BytesDecoder]]: │ │ 152 │ │ """Iterator version of yaralyze. Yields match and decode data tuple back to call │ │ ❱ 153 │ │ self.rules.match(data=self.bytes, callback=self._yara_callback) │ │ 154 │ │ │ │ 155 │ │ for yara_match in self.matches: │ │ 156 │ │ │ console.print(yara_match) │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ Error: internal error: 46

lokman2k5 commented 1 month ago

this is the traceback

michelcrypt4d4mus commented 1 month ago

did you try using this approach to isolate the problem?