Yararule failing silently

MikeHofmann commented 4 years ago

Description of problem:

I'm trying to use plaso to check imagefiles with a large set of yara rules (taken from malpedia). It is convinient to use plaso, as it can read most imageformats (E01, vmdk, dd, ...) and doesn't require root rights vs. crossmounting, selecting the right partion, etc. For this purpose i have created a large file containing all available yara-rules. However there are two problems:

Even with --debug on the cmd-line, there is no log information why a certain rule fails. Even giving a non-existing rules-file doesn't produce an error.
There are rules who simply won't work (without giving a reason, see above), but do work as intended when using the yara-cli from virustotal.com

Command line and arguments:

log2timeline.py --workers 32 --debug --yara-rules /tmp/failing.yara test.plaso /malware/

Source data:

This is an example for a rule which fails, but works using yara-cli (without warnings, btw.)

import "pe"

rule win_darktrack_rat_w0 {
       meta:
              author = "jeFF0Falltrades"
              hash = "1472dd3f96a7127a110918072ace40f7ea7c2d64b95971e447ba3dc0b58f2e6a"
              ref = "https://news.softpedia.com/news/free-darktrack-rat-has-the-potential-of-being-the-best-rat-on-the-market-508179.shtml"
              malpedia_reference = "https://malpedia.caad.fkie.fraunhofer.de/details/win.darktrack_rat"
              malpedia_version = "20190905"
              malpedia_license = "CC BY-NC-SA 4.0"
              malpedia_sharing = "TLP:WHITE"

       strings:
              $dt_pdb = "C:\\Users\\gurkanarkas\\Desktop\\Dtback\\AlienEdition\\Server\\SuperObject.pas" wide ascii
              $dt_pas = "SuperObject.pas" wide ascii
              $dt_user = "].encryptedUsername" wide ascii
              $dt_pass = "].encryptedPassword" wide ascii
              $dt_yandex = "\\Yandex\\YandexBrowser\\User Data\\Default\\Login Data" wide ascii
              $dt_alien_0 = "4.0 Alien" wide ascii
              $dt_alien_1 = "4.1 Alien" wide ascii
              $dt_victim = "Local Victim" wide ascii

       condition:
              (3 of ($dt*)) or pe.imphash() == "ee46edf42cfbc2785a30bfb17f6da9c2" or pe.imphash() == "2dbff3ce210d5c2b4ba36c7170d04dc2"
}

Plaso version:

plaso - log2timeline version 20201007

Operating system Plaso is running on:

Running using the docker image

Installation method:

using the docker image:

log2timeline/plaso:latest
DIGEST:sha256:c43400a195e71aaf19829276f660f1ccdcb3045d039fb458da4cfc0487a1dccd

Debug output/tracebacks:

i did a zcat *.gz | grep yara, which didn't match anything.

joachimmetz commented 4 years ago

@MikeHofmann thx for the report, @Onager can you take a look

Onager commented 4 years ago

Thanks for the thorough report. I suspect some of the issues here are because of the import in the Yara rule.

Onager commented 4 years ago

After looking at this a bit more, I've identified a couple of issues:

The Yara analyzer was raising the wrong exception, so error messages weren't being displayed
The rule you provided used the imphash() function of the PE yara module, and it looks like this function might not exist, depending on how the yara-python library was built: https://github.com/VirusTotal/yara-python/issues/97 https://github.com/VirusTotal/yara-python/issues/28 Specifically, it looks like when yara-python was built for GIFT, it didn't find libssl for some crypto support. @joachimmetz as you last built the yara-python package for GIFT, does that sound plausible?

I've sent a fix for the first issue in #3299 but for the second we'll need to rebuild the yara-python package and make a new container.

MikeHofmann commented 4 years ago

We are compiling a cli-version of yara for internal use with:

./configure --with-crypto \
                   --enable-magic \
                   --enable-cuckoo \
                   --enable-dotnet \

So the above rule naturally works using this version of yara. Haven't thought about this (shame on me and :clap: for @Onager)

Just to be on the safe side, it would be cool to have these enabled in the yara-python package as well.

joachimmetz commented 4 years ago

This will depend on which additional dependencies this imposes. --with-crypto and --enable-magic are likely doable for a stock Ubuntu.

MikeHofmann commented 4 years ago

I'm building yara using alpine (inside docker), but according to the yara documentation under ubuntu the following should be sufficient:

apt-get install flex bison libssl-dev libjansson-dev libmagic-dev

joachimmetz commented 3 years ago

yara-python has no --with-crypto option https://github.com/VirusTotal/yara-python/blob/master/setup.py#L34

looks like it tries to detect libcrypto in setup.py https://github.com/VirusTotal/yara-python/blob/master/setup.py#L238

joachimmetz commented 3 years ago

https://github.com/log2timeline/l2tdevtools/pull/947

joachimmetz commented 3 years ago

New build pushed to GIFT PPA testing

joachimmetz commented 3 years ago

New build pushed to GIFT PPA dev

@Onager can you add a unit or end-to-end tests to ensure crypto an magic based rules are supported

joachimmetz commented 3 years ago

MikeHofmann commented 3 years ago

@Onager can you add a unit or end-to-end tests to ensure crypto an magic based rules are supported

This should test libcrypto, magic and pe (using the test_pe.exe from /test_data):

import "hash"

rule libcrypto_hash
{
        condition:
                hash.md5(0, filesize) == "ab2e0a9184d2718995d3f41c70df7027"
}

import "magic"

rule magic_mimetype
{
        condition:
                magic.mime_type() == "application/x-dosexec"
}

import "pe"

rule pe_characteristics
{
        condition:
                pe.characteristics & pe.EXECUTABLE_IMAGE
}

MikeHofmann commented 3 years ago

An analyst asked me for an update on this issue. Is there anything i can do, to get this fix released?

joachimmetz commented 3 years ago

Based on the comments on this thread so far https://github.com/log2timeline/plaso/pull/3299 (that should notify about the rule failing) and libcrypto and magic support have been enabled in the yara-python build https://github.com/log2timeline/l2tdevtools/pull/947. Which should be part of the latest Plaso Docker release.

Adding an end-to-end test still needs to be done. Not sure what update your analyst is looking for?

MikeHofmann commented 3 years ago

I checked the image she is using. Seems we failed in mirroring the current docker-image to our darksite. I'll test and report back.

joachimmetz commented 3 years ago

Checking again it looks like the new yara builds were not yet pushed to GIFT PPA stable, and therefore not included in the Docker build https://launchpad.net/~gift/+archive/ubuntu/stable/+packages?field.name_filter=yara&field.status_filter=published&field.series_filter=

I've promoted the builds from GIFT PPA dev to stable, so they should be in the next release. In the mean time you could try to update the yara version of the latest Docker container, from GIFT PPA.

joachimmetz commented 3 years ago

Note to self add end-to-end test.

MikeHofmann commented 3 years ago

I've promoted the builds from GIFT PPA dev to stable, so they should be in the next release. In the mean time you could try to update the yara version of the latest Docker container, from GIFT PPA.

I tested the lastest image from hub.docker.com anyway. The rules are still not executed, but now log2timeline gives a proper errormessage when it can't parse a rule:

ERROR: Unable to parse Yara rules in: /rules/malpedia.yara with error: line 185208: invalid field name "imphash"

which is a huge step forward. I'll probably will wait for the next release though. Building docker images depending on a working pythonmirror inside a dark site can be a real pain in the a...

joachimmetz commented 3 years ago

ack, I'll have a look at adding the end-to-end test when time permits. I saw that there is a new yara-python release as well.

Could you provide some context on your workflow. It should be straight forward copy a docker image from an internet connected system to one that is now. Also see: https://stackoverflow.com/questions/23935141/how-to-copy-docker-images-from-one-host-to-another-without-using-a-repository

MikeHofmann commented 3 years ago

Ah, no getting a docker-image is not the problem. And we are way beyond a simple docker save / docker load in our workflow: Using skopeo in a DMZ to copy an image to a private docker registry, then using a CI/CD in internal network to pull the image from the private registry, test for security flaws, add additional features and then push it to a private gitlab hosted registry.

Problem is rebuilding anything from scratch which needs pip. Maintaining a python mirror using bandersnatch has proven more difficult in our setup than its worth the benefit.

But this leaves the scope of the issue. I will tell my analysts to be patient a little longer.

joachimmetz commented 3 years ago

Problem is rebuilding anything from scratch which needs pip.

Note that for the Plaso Docker container we use the GIFT PPA not pip. To update yara run apt-get update && apt-get upgrade python3-yara

joachimmetz commented 3 years ago

changes for additional tests merged

Also had to make some changes to yara-python to build with crypto support on Windows https://github.com/VirusTotal/yara-python/pull/167

joachimmetz commented 3 years ago

Some additional changes needed

ensure unit test fails on systems with yara with magic enabled - https://github.com/log2timeline/plaso/pull/3467
change fedora yara-python build to include crypto and magic support

MikeHofmann commented 3 years ago

To give some feedback:

Technically everything works as intended. We have a large collection of yara rules simply concatenated to a large file (180k lines). Using the --yara_rules option works perfect with this. Only downside is the empty yara_match attribute in the psorted output file, but this can easily filtered out.

Practically the idea in using plaso together with yara to search for malware infection in image files doesn't work at all. So far most yara-rules only detect malware when used against memory dumps. Even simple packed malware is virtually undetectable this route. So if you're reading this, having the same idea as i had: Don't waste your time, use an AV-Scanner.

However we will still use this with specialized rules. So the feature is not in vain.

joachimmetz commented 3 years ago

so in-memory yara signatures do not always are applicable to on-disk executables, could you explain a bit your use-case / approach ? Plaso is a timeline tool not a malware analysis tool

MikeHofmann commented 3 years ago

We receive a large amount of images in various formats (Expert Evidence, .vmdk, .vhdx, .raw, etc.). Our task is to find a certain malware and whatever other malware this thing might have installed along. The initial malware has been removed by various tools (av scanners, desinfectant tool) at some point in time. Of course each malware has its own means of persistence (from HKCU\Software\Microsoft\Windows\CurrentVersion\Run, to Scheduled Tasks or simple Autorun Folders), some disable other Services, etc. Basically everything log2time can extract from an image. The expected result is a nice timeline with every event on how, when and where something malicious happened.

We thought in the meantime about crosschecking every hash with virustotal or even write a plaso-plugin for a multi-AV scanner like irma or metadefender. Both have nice and easy REST APIs. But both solutions would most likely fail because of license induced (virustotal) or performance related (multi-AV) rate limits. This would require something like celery in between to handle job queues.

Probably we will have to resort to mount the images and then use some generic AV-Scanner to identify the files. Then we could take the SHA256 of these files and search for any related events in the timeline.

joachimmetz commented 3 years ago

Still not clear to me what your use case is. If the AV has removed (quarantined the malware) (per: "The initial malware has been removed by various tools"). How do you hope to expect to find it with another AV or SHA256 lookup?

MikeHofmann commented 3 years ago

Sorry, not a native speaker here. Yes i meant quarantined when i wrote removed. So it just moves the malware to a different location on the drive. The expectation is to find the location of the first malware in its quarantined location (no, it's not a predictable path) and any other malware still loose on the system.

joachimmetz commented 3 years ago

So it just moves the malware to a different location on the drive

I'm afraid your assumptions are wrong here. Most AV quarantines malware in a proprietary container format. Just moving the executable would not remove the risk of the user executing the malware.

MikeHofmann commented 3 years ago

I checked and its a user error on my side. Some rules did actually hit but i missed it in the .csv.

To give some more detail, while trying to avoid the google indexer by using abbreviations:

It's about proving a theory which malware come's first. Current working theory is:

Malware E v1.0 is installed on victim.
E v1.0 connects to its C2 and receives command to download and install Malware Q.
Step 2 might be repeated multiple times. (and Q has its own independent C2, so it might branch off even more)
E v1.0 connects to its C2 and receives command to perform a self-update to E v2.0.
E v2.0 installs itself taking the location and filename of E v1.0. While writing a copy of E v1.0 in a proprietary format to another location and random filename (quarantine).

So far i have a Yararule to detect E v2.0 and the proprietary quarantine format (not the content). Also i can decrypt the proprietary format to check the content. But no rule to detect Q or anything else. By manual inspection i know that there is more to find. But when the images start piling up, manual processing is not a good idea. The sequence of events could provide some insight about the personal connections of the various C2 operators. That is one of the reasons where analyzing so many images from various victims.

joachimmetz commented 3 years ago

Sounds like many factors outside the scope of Plaso. As I said, Plaso is a timeline tool not a malware analysis tool.

Seeing that you mention the malware is overwritten, you have to consider the data might no longer be there or that you have to recover it.

By manual inspection i know that there is more to find. But when the images start piling up, manual processing is not a good idea.

Plaso is open source (FOSS) so you can customize it if needed. Automation can help with analysis, especially at scale, but most automation is not a replacement for proper analysis work (manual or automated).

log2timeline / plaso

Yararule failing silently #3289