kaitai-io / kaitai_struct

Kaitai Struct: declarative language to generate binary data parsers in C++ / C# / Go / Java / JavaScript / Lua / Nim / Perl / PHP / Python / Ruby
https://kaitai.io
3.98k stars 194 forks source link

Document security expectations #170

Open koczkatamas opened 7 years ago

koczkatamas commented 7 years ago

Currently I am not sure the Kaitai-generated code won't cause any security issues, so as a first step we should create a warning about this.

I have some ideas:

Currently I did not evaluate whether we are susceptible problems listed above or not. That's why we should warn our users that we are not recommending executing code generated from untrusted ksys and untrusted inputs as a determined attacker may can cause harm.

Of course later we should evaluate whether these issues can arise or not to our best knowledge and solve them if we can.

KOLANICH commented 7 years ago

an attacker can create a specifically crafted .ksy file which can execute more privileged commands than the user would expect more concrete example: if we'll support direct WebIDE links to user created ksys, that ksy may call an opaque class called Function which actually calls Javascript's Function and executes the argument (this probably won't work as it would expect string), or use some built-in property eg. constructor to access security-critical property (eg. "".constructor.constructor("alert(1)")).

to mitigate this we need to make sure that ksc-generated code cannot overwrite anything out of its scope and cannot call arbitrary functions. functions to process must be registered ahead of time in a predefined place, classes should be put into a namespace.

  • an attacker can cause buffer-overflow or Denial of Service attacks by supplying specifically crafted payload to our generated parser

Not sure if it is possible without vuln in runtime for now. KSC uses stream-like API, so we need to make sure KS runtimes check bounds.

  • this may happen by causing integer overflow in expression and then abusing some parsing logic

IMHO Integer overflow may be possible, but not sure if it can be exploitable if unsigned types are used for offsets and counters. For example, if we have a variable-length array with terminator, and if a runtime stores its length somehow, it is possible to cause overflow the integer 2 ways: directly make array larger or make it large enough so when used in arythmetics cause an overflow or underflow. But ksy uses stream functions which are assumed to check bounds. So it is not exploitable when parsing. The problem here is that there can be external code assuming that no over/underflow occured and using the values without further checks. So we should detect overflows and throw exceptions. Some compilers have this feature, for another ones we'll need some asm/intrinsics.

Another problem is that the data can contain some indexes, which can be out of bounds. KSC uses stream API, but the app will use memory. So, KS runtime should provide

Checked indexes should be autoconstructed from unchecked (with check) if a preprocessor directive present which enables this behavior. It should be disableable because it has performance drawbacks, but at early stages we don't want to worry about this, but at some point we'll want to eliminate these checks.

And I think we need some fuzzing to check for possible problems.

GreyCat commented 7 years ago

an attacker can create a specifically crafted .ksy file which can execute more privileged commands than the user would expect

This is totally a separate issue, let's not mix them up. Regular end-users are not concerned about this: when one can craft a .ksy and compile it and run resulting code, basically everything can happen, and I don't think it's a good idea to invent some roadblocks here. ksc generates text, not some ready-made syntactic tree or anything, so it's not protected against any injections or anything, but that's totally ok with me. I strongly think that it's a bad idea to introduce any "security" features here, like deliberately disabling calls to external code, etc, etc., as this is very hard to maintain and prove to provide proper level of isolation. If someone needs doesn't trust ksy (and thus generated code), it needs to be sandboxed by environment (i.e. OS sandboxing, web worker sandboxing, whatever), period.

Regular users who generate parsers are concerned by this as much as "can gcc be used to compile arbitrary code?" Yes, it can, but it's not user's security problem.

The only people who might be concerned are tools developers, i.e. us. And here we need to provide a normal level of security expected from web applications, i.e. XSS, CSRF, what else's possible to abuse in a serverless web app?

Currently I did not evaluate whether we are susceptible problems listed above or not. That's why we should warn our users that we are not recommending executing code generated from untrusted ksys and untrusted inputs as a determined attacker may can cause harm.

C'mon, this is code, so it's common sense to not execute code (even generated one) that you don't trust in your valuable environment? Do we really need to go to these basics?

koczkatamas commented 7 years ago

This is indeed a different issue, maybe it was not a good idea to create the same Github issue for the both.

And yes, you are right as long as we have a distinct code generation step in place and a manual "run this code" step.

Currently this is not true for the WebIDE as it runs the ksy directly. I presume the situation is the same with ksv. I think the user will blindly trust the ksy if he/she use it with the WebIDE or ksv, so I think the warning is justified. (It's like you don't expect from a Word document that the sender will able to run code on your machine.)

GreyCat commented 7 years ago

It's like you don't expect from a Word document that the sender will able to run code on your machine.

True, that's a valid argument. Then, I guess, we could add it in a format reply to question like "how do I ensure that random ksy I've got off the net is not harmful?"

arekbulski commented 6 years ago

As someone who has a certificate in cryptography from Stanford, I think you missed the point entirely. If you need that much security, or run that untrustworthy code, then you should have a sandbox around entire KS compiler. Thats probably what @GreyCat meant in general.

koczkatamas commented 6 years ago

If your reply was sent to me, then I don't understand a few points:

arekbulski commented 6 years ago

Sorry Tamas, but by saying that I have a certificate I didnt mean to belittle you, just pointed out that I do have some background in security. Now you are the one belittling my formal training. Stanford is one of few best universities in the world, including their online courses.

Admittedly I do have less understanding of Kaitai than you. I only stated my impression.

koczkatamas commented 6 years ago

Oh okay, actually I misunderstood you, I thought you were talking about me, because I've probably finished the same course. Are we talking about this, right?

So yeah, sorry, I've probably overreacted the first point and sorry for that.

arekbulski commented 6 years ago

Yup and yup. That is the course, and my favorite too. I got 20+ certificates, and not in English writing either.

KOLANICH commented 6 years ago

I don't understand how crypto is related to this issue.

arekbulski commented 6 years ago

It doesnt strictly relate, but it is part of information security domain, which I also studied btw. Its a credentials.

KOLANICH commented 6 years ago

It's completely different part. Crypto is mostly about cryptographical guarantees, not exploit mitigations and writing memory-safe software, even though there are cryptographical ways to do that. In the context of this issue your crypto experience is irrelevant.

arekbulski commented 6 years ago

You are wrong. Exploiting memory leaks and timing attacks are officially part of acedemic field of cryptography. My professon, Dan Boneh, his most famous paper is called "timing attacks are practical". But thats beside the point, you are making a fuss over what was just stating credentials for the record.

KOLANICH commented 6 years ago

The fact that Dan Boneh is a famous cryptographer doesn't mean that everything he touches is crypto. For example he is in the list of co-authors of gyrophone and powerspy paper, it doesn't mean that snooping words using gyroscopes and accelerometers or position via a battery is crypto. And stealing secrets using timing attacks or reading memory or any other attacks on real-world implementations is not crypto itself.

We have run out of topic though.

GreyCat commented 6 years ago

Gentlemen, please do calm down. This is not a chat and not a forum. If you really want to discuss formal education or famous educators, please continue by other means, like e-mail.

julie-bsx commented 6 years ago

Since there was a bit of credentials wagging, and this doesn't seem to have reached a reasonable resolution, I was the NSA certified Vendor Security Analyst on multiple secure operating system evaluations. Security-related code I wrote decades ago runs on billions of devices. Take that ;)

The set of potential problems with Kaitai generated code has to include using Kaitai in a trusted application in the first place. The people who know what that means also understand that it means "you better know what the Hell you're doing before you do that."

The original issue related to a small set of hypothetical vulnerabilities that anyone who cares about OpSec or InfoSec should already care about. As such, this issue isn't news.

generalmimon commented 1 year ago

There are possibly many ways to inject arbitrary code into a .ksy file, but this seems to be by far the easiest one:

meta:
  id: doc_arbitrary_code
doc: |
  */ console.log('hello'); /*

Generated JavaScript code:

    root.DocArbitraryCode = factory(root.KaitaiStream);
  }
}(typeof self !== 'undefined' ? self : this, function (KaitaiStream) {
/**
 * */ console.log('hello'); /*
 */

var DocArbitraryCode = (function() {
  function DocArbitraryCode(_io, _parent, _root) {

As said before, we're not concerned about the security aspect because a .ksy file to be compiled into a parser that will be run should never come from an untrusted source. But it's still a bit painful to see, and the user is also not able to get */ into the generated docblock literally.