tomlokhorst / language-cil

Manipulating Common Intermediate Language AST in Haskell
Other
21 stars 8 forks source link

Maxstack calculation #3

Open dmcclean opened 13 years ago

dmcclean commented 13 years ago

Partition III, Section 1.7.4 requires correct CIL instruction sequences include a limit on the height the stack may reach during execution of that sequence.

Relatedly, Section 1.7.5 requires that the shape of the stack is the same at each location in a valid instruction sequence no matter what series of branches resulted in entry to that location.

Doing an analysis of compliance with 1.7.5 gets you the answer to 1.7.4 as a side effect, and abstract interpretation to do the 1.7.5 analysis is only a little bit more complicated (track stack shapes including types, instead of just stack height), so it might be worth doing both. Doing the analysis for the .maxstack directive is actually required for correctness; unfortunately ilasm doesn't perform this analysis on its own, it relies on the input IL. (ilasm does do some other nice things though, including rewriting "long" branches to "short" branches where possible if you use the /optimize switch)

This is a medium-sized project, between writing the abstract interpretation and combing through the documents to record the behavior of each instruction.

tomlokhorst commented 13 years ago

I just read those two sections, and I agree it's a good idea to do those simultaneously. The abstract interpreter might be a bit more complicated than the maxstack analysis, but I think a large chunk, if not most, of the work will be looking up the stack behaviour for all of the instructions.

The documentation for each instruction already contains a description of stack behaviour in textual form. If there's a separate data structure describing stack behaviour, the documentation could potentially even be generated from that.

I guess (off the top of my head) we need something like:

data StackEffect = StackEffect
  { pop  :: [TypeShape]
  , push :: [TypeShape]
  }

stackEffect :: OpCode -> StackEffect
stackEffect Add = StackEffect [undefined, undefined] [undefined]
stackEffect (CallVirt rt _ _ _ ps) =
   StackEffect (concatMap primitiveTypeToTypeShape ps)
               (primitiveTypeToTypeShape rt)

primitiveTypeToTypeShape :: PrimitiveType -> [TypeShape]
primitiveTypeToTypeShape Void = []
primitiveTypeToTypeShape _    = [undefined]
dmcclean commented 13 years ago

The machine-readable version (as a C header file) is in Partition VI, Annex C, Section C.2. It describes both the stack behavior and whether control goes straight through or potentially/unconditionally branches, etc.

This analysis probably does get more complicated in the face of exception handling code, that is a whole other ball of wax.

dmcclean commented 13 years ago

The file is opcodes.def, look in %ProgramFiles%\Microsoft SDKs\Windows\v7.0A\include

Your mileage may vary depending on which OS you are on, on 64-bit it goes in the x86 program files directory, and that version number can change too.

dmcclean commented 13 years ago

I translated the opcodes.def file into Haskell. I'm working on the abstract analysis part for stack height only (not stack shape) because it is simpler to start with.

The one quirky thing is that the pop behavior of the ret opcode depends on the declared return type of the method where the opcode appears. All of the others either have a fixed behavior or have one that depends on their MethodRef (or for calli, which is also slightly squirrelly, its CallSiteDescr) parameter.

I'm making the analysis run in the error monad because the spec disallows a fair number of strange control flow patterns (Partition III, Section 1.7.2, 1.7.5, 1.7.6) specifically for the purpose of simplifying this analysis. I'm going to return an error if the maxstack calculation happens to discover a violation of those rules.