nim-lang / csources

The pre-generated C sources of the Nim compiler which aid in bootstrapping. This repository is archived because it's frozen, HEAD of csources can build Nim version 1 and any later version.
50 stars 33 forks source link

Build setup is susceptible to backdooring #12

Closed infinity0 closed 7 years ago

infinity0 commented 9 years ago

In general, we want to be sure that a binary corresponds to the source that it claims to be built from. This is a hard problem (Thompson's "Trusting Trust" paper), but (e.g.) Debian's reproducible builds effort and "Diverse Double Compilation" can help improve our confidence in this security property.

However, the current build system is such that even if we assume outside tools (kernel, C compiler) are clean, we can still end up with a backdoored Nim compiler. This can be done similarly to "Trusting Trust" - it is possible to poison this csources repo (call this B0, which is effectively an opaque unverifiable binary), such that when a clean C compiler builds it into executable code (call this B1), B1 will run on the Nim compiler sources (Araq/Nim) in such a way that the resulting binary Nim compiler (call this B2) contains a backdoor. Further, B2 then can regenerate this csources repo exactly as B0, containing the exact same backdoor, which is still undetectable (opaque autogenerated C code), yet it "looks legit" since you have a converging bootstrapping process.

infinity0 commented 9 years ago

To follow up on this, the way to fix this issue would be to make this repo (csources i.e. B0) be human-readable enough to review and verify that it has no backdoors. This would give us the assurance that "assume clean external tools => good clean Nim binary compiler".

The Diverse Double Compilation technique mentioned previously can then allow us to handle the situation where we don't know if our external tools are clean. But currently, even if we know the external tools are clean, we do not know that the resulting Nim binary compiler is clean.

Araq commented 9 years ago

You're free to setup the Debain package in the way that seems best to you.

To follow up on this, the way to fix this issue would be to make this repo (csources i.e. B0) be human-readable enough to review and verify that it has no backdoors.

PRs are welcome.

infinity0 commented 9 years ago

This is not a Debian issue; it is a key part of upstream that the bootstrap process starts from opaque code that cannot be reviewed. This is against the idea of FOSS being verifiable by humans for security. Please re-open this issue, even if you don't intend to fix it yourself.

Araq commented 7 years ago

Nim's "opaque autogenerated C code" is still easier to review than GCC's messy source code.

infinity0 commented 7 years ago

As a funny co-incidence I'm patching GCC myself right now, and it is definitely easier to review than the Nim opaque autogenerated C code.

You are deluded if you think people want to review 23 copies of basically the same code containing variable names like equalmem_7495_1689653243 and without any comments. Get your head out of your own ass.

Araq commented 7 years ago

You are deluded about the hole security problem. Name one backdoored actually used compiler in the history of computing. Just one. The described problem is an entirely academic excercise with no connection to the real world.

You also fail to understand the problem of bootstrapping a compiler. The equivalent would be GCC's code as generated assembler code on github. Same problems. Instead you use an existing GCC binary to compile GCC. Surprise, you can do the same with Nim.

infinity0 commented 7 years ago

Oh, so now you go and change your argument.

Have you heard of the term "0-day"? It's used to refer to an exploit that hasn't been publicly known before. There are many examples of "0-days" that happen all the time. Furthermore, as the person distributing the binaries, you have an inherent interest in trivialising the problem. This is not about making the situation better for you, but your audience.

Where does GCC put generated assembler code on github and make this an intrinsic part of their build process? No, you can start with any C compiler. That's not the same with Nim.

The security property we want here is "if the starting compilers are good, then the resulting compilers are good". This reduces the amount of assumptions in the whole system, which is a valuable security goal.

We (almost) have this property with GCC and other compilers; we don't have this with Nim because you make this opaque code an intrinsic part of the build process. The simple fix would be to fix your bootstrapping process to allow bootstrapping directly using another Nim compiler (e.g. the previous version), to compile reviewable Nim sources and not unreviewable C/assembly code. This is what Rust does, for example.

Please take some logic lessons on the difference between "if P then Q" vs "Q" before repeating the same incorrect points.

(edit: clarify compilation using reviewable sources. Bundling autogenerated code into the previous version doesn't count.)

Varriount commented 7 years ago

Look, what do you propose we do? A full audit of the Nim compiler? Making it more readable might help, but ultimately someone would have to go through it all.

Even then, what are your guarantees that GCC or the library loader isn't backdoored? Or done part of the distro you use? These are all much bigger targets than Nim.

To put it simply: The likelihood of Nim being backdoored is quite small, especially when there are numerous other targets that would yield greater opportunities to hackers. That, combined with the fact that we don't have the resources to do such an audit (and that no-one has volunteered) mean that this is really a low priority. It's all about cost vs likely benefit.

And while I don't approve of @araq's tone, and would rather not close the issue, I can't exactly fault his general view.

You might look at the NLVM project, which aims to build a Nim transpiler for LLVM. Even if Nim is backdoored, I doubt that any possible code injection would be so complex as to inject itself into LLVM and C code. Also, you could try the C++ and Javascript backends, too.

dom96 commented 7 years ago

I'm certain that this issue will fix itself in the future. Once Nim is stable enough it will be buildable using a previous version of itself, after that the C sources will no longer be necessary.

But this makes me wonder, how will we know that somebody hasn't edited the binaries? Surely they are even more difficult to verify than the generated C source code!

infinity0 commented 7 years ago

We won't know that somebody hasn't edited the binaries, using just one single binary. But what we can do is get multiple independently-written binaries, and test that they result in an identical Nim compiler (after stage2). This is "Double Diverse Compilation" and you can look that up to read more about it.

However, currently the csources is a fixed part of the Nim build process, so what I just described won't work. For it to work, we need to be able to compile Nim by using "any X compiler" to compile reviewable source code written in language X, into the Nim compiler. X doesn't have to be nim-lang, it could be C even.

Yes, reviewing source code is hard[1]. Yes, getting multiple independent Nim compilers is hard. That doesn't mean the other stuff is pointless. But sure, you guys spend as much time as you want. I just want to correct some misunderstandings about what is being asked.

I'm not even proposing to get rid of the csources. They can remain as an optional step, if you don't have an earlier Nim compiler to start with, for example.

[1] it's actually not that hard; we're only verifying the non-existence of self-replicating backdoors. Other backdoors are detected separately outside of this process.