Open lebr0nli opened 4 months ago
Thank you for reporting the issue, Alan! (and nice writeup - we're stoked)
Yes, the current versions of Magika (v1 and v2) only analyses a portion of the file, so bypassing attacks are possible if the attack file format allows to keep those portion in the right places. This choice allows us to maintain a near-constant execution time irrespective of the file size, which is a nice property to have for high throughput deployments. However, it does have drawbacks, as we mention in the README.md
in the limitations section, and in the soon-to-be-released paper.
Fortunately, Magika v2 correctly detects the file as ELF with a score of 0.999+. This is using the draft_standard_v2
model - check out the rust implementation that already uses it. That said, we'll looking into improving Magika's resilience against adversarial attacks, though making Magika completely adversarial-attack proof is likely an impossible task.
Since v2 correctly detects this sample, we'll close the bug. Before that happens, though - @reyammer , maybe we can add this in our test suite, just to keep this approach around. For that to happen, though, Alan has to sign the Google CLA (contributor agreement)
I noticed that somehow Magika's model is maybe too sensitive to the TGA file footer, and it can be used to create adversarial examples easily.
Here's an adversarial example I created to make an ELF file be mistakenly identified as a TGA file:
poc.so
The above adversarial example can be compiled by
nasm -f bin -o poc.so poc.s
with this poc.sIf you
LD_PRELOAD=./poc.so /bin/cat
with a x86-64 Linux, you should see/bin/id
been executed, which means this is definitely a valid ELF file, not a TGA file.However, Magika will identify it as a TGA file, with score
1.0
: