zenhack / crypt

GNU Lesser General Public License v3.0

0 stars 0 forks source link

The ideas issue #1

Open frozencemetery opened 7 years ago

frozencemetery commented 7 years ago

TODOS

[ ] any existing work we should investigate
[ ] mock-ups of existing algorithms
[ ] how do loops work?
[ ] do we need conditionals?

GOALS

Code generation reflects written code
- all optimization guaranteed harmless
Easily FFI from other languages
- generate native code
Turing non-completeness
No timing attacks
Always initialize and clear variables
Base types informed by layout-dsl
- Arbitrary width
- Arbitrary endian
Typing that bench presses cars

Special considerations

Will probably need to roll our own backend
Don't forget about memory access patterns

References

Formal semantics for LLVM

frozencemetery commented 7 years ago

If we need conditionals, we should be able to solve most timing issues by just running both branches and then picking the correct result to return, but it's not clear that we do.

For loops, I was considering having both a modified for-each type loop and a fixed-iteration loop, but I need to actually play with this on an existing system to see how it pans out.

zenhack commented 7 years ago

Was some interesting discussion on ##crypto; someone suggested adding a secret qualifier to LLVM (and maybe plumbing it through to Clang) that banned the use of the data in non-constant-time operations and zeroed it out appropriately. Said it would make a good master's thesis. Someone also mentioned having talked to the rust developers and come to the conclusion that as-is there is nothing that LLVM gives you that provides any constant-time guarantees.

zenhack commented 7 years ago

Question: What do we know about what hardware operations are known to be constant time? I wouldn't want to assume e.g. mul is constant time on every (or even any) arch.

zenhack commented 7 years ago

Something I'm puzzling over: How do we deal with variable length inputs? E.g. a common API for hash functions allows pushing arbitrary length data into the hash at a time, via a write() method or somesuch. The fips hashes chunk this into fixed-size blocks, but doing this would mean somewhere needing to do things that depend on the size of the input, besides just looping over them in predictable patterns. How to deal with the control flow for padding the last block in particular is not obvious to me.

zenhack commented 7 years ago

@frozencemetery, I pushed a possible partial implementation of sha256, would like your opinions. Notably, it doesn't tackle the padding issue. Also of note is that I'm using a loop that isn't exactly what we talked about -- it gives us a loop index that can be used to access specific elements of an array. It's apparent to me that a simple foreach (with possible step) won't do quite what we need, but I think the construct in the example is OK provided that we impose a type discipline that makes loop indexes distinct from other kinds of integers, disallowing indexing by things that may be influenced by messages/keys/soforth.

Another thing that comes to mind: bounds checking. can we realistically have a construct that prevents these errors statically without having to deal with the complexity of dependent types? I suppose we could just run the loop in some mock interpreter during compilation to see if it violates bounds. If memory accesses are deterministic and the language always halts, then that might actually work. This might get tricker if we want to operate on variable length arrays. If we can't do everything we want statically, we need to consider the implications of dynamic checks re: timing invariants.

zenhack commented 7 years ago

Another thing: if we're planning on allowing uint<N> for any N, we need to think about issues like carries, and what those will do to timing. BigNums have caused a few real timing sidechannels. At some point in our translation process we'll want to have an IR that doesn't have this feature, but still has the timing guarantee -- that should prevent any actual vulnerabilities because of this feature. But we still need to figure out how we're going to implement the abstraction.

zenhack commented 7 years ago

Re: timing of hardware instructions, I'm actually fairly certian that mul in particular is not constant time on all x86 CPUs; I vaguely remeber an intro systems exercise where I did some measurements...

zenhack commented 7 years ago

The other night I floated the idea of having qualifiers "public" and "secret", and e.g. only allowing indexing by public types, attapl gives a presentation of substructural (e.g. linear) types that might provide some inspiration for how to formalize.

frozencemetery commented 7 years ago

I see what you mean about the for syntax being insufficient. I think in general this approach isn't going to work without excluding some kinds of mixing, so it probably does need to fall back to bounds guarantees of some sort.

I feel like we can restrict array accesses such that the largest and smallest possible such values must be within the length of the array without running afoul of the halting problem. We could combine this with a dynamically-checked annotation for minimum array length as well. Would need to play with this more, but I think it could work as a parameter annotation during function declaration.

zenhack commented 7 years ago

What you're talking about sounds a lot like dml-style dependent types. Some variants of that can accommodate the compiler doing the proof work, an we may be able to get by without something more expressive than. We'll likely run into trouble if we try to support constraints involving multiplication, which could come up for multidimensional arrays. But I'm hopeful that we won't need that.