richardlehane / siegfried

signature-based file format identification
http://www.itforarchivists.com/siegfried
Apache License 2.0
214 stars 30 forks source link

Question about YARA #248

Open robin-francois opened 1 month ago

robin-francois commented 1 month ago

Hello,

Following a discussion on Mastodon, @ross-spencer suggested that we opened an issue here to have a discussion with you @richardlehane about this. @steffenfritz from https://github.com/steffenfritz/FileTrove was also part of the discussion.

Initial question was:

I just saw that YARA-X is out, improving on YARA malware signature language, and I was wondering if YARA was ever considered for file identification à la DROID/Siegfried? From afar, it looks like both digipres and IT sec worlds are trying to do similar things. YARA is dead, long live YARA-X

richardlehane commented 1 month ago

Hi @robin-francois, thanks for sharing this. I definitely agree that YARA and other malware/AV tools (e.g. ClamAV too) are close cousins of droid/siegfried. They are all essentially just scanning files against databases of patterns. One point of difference might how they are optimized: droid and siegfried focus on scanning the start and end of files, whereas I guess malware authors are more cunning and tools like YARA and ClamAV may be more optimized towards full file scans.

I saw you were doing a bit of benchmarking to see if YARA running droid signatures might be faster. To get equivalent numbers for a benchmark you could build a sf signature file just limited to the same patterns you are scanning in YARA (e.g. do roy build -limit @zip if you are only searching for zip). Also make sure you are running siegfried with a -multi setting if not already (I should really make that a default). E.g. sf -multi 32 DIR-TO-SCAN. You could also do something very similar for droid by using Martin Hoppenheit's tool droidsfmin to trim out all the non-zip signatures.

As well as looking at transpiling droid signatures to run in YARA you could also consider transpiling YARA signatures to run in DROID or siegfried. It may also be possible to write a custom identifier for siegfried that can parse YARA rules. This might make it possible e.g. to do some basic malware scanning at the same time as doing a PRONOM identification.