tomlokhorst / language-cil

Manipulating Common Intermediate Language AST in Haskell
Other
20 stars 8 forks source link

A monad for writing method bodies #6

Open dmcclean opened 13 years ago

dmcclean commented 13 years ago

The goal of making use of the library read as much like the resulting IL as possible, together with some issues around alpha-conversion, led me to this.

Instead of writing a method body as a [MethodDecl], I propose something like this change from:

ioAge :: MethodDef
ioAge = Method [MaStatic, MaPublic] Void "ioAge" []
[ maxStack 11
  , localsInit
  [ Local Int32 "x"
  , Local (ValueType "mscorlib" "System.DateTime") "d1"
  , Local (ValueType "mscorlib" "System.DateTime") "d2"
  ]
, ldstr "What year were you born?"
, call [] Void "mscorlib" "System.Console" "WriteLine" [String]
, call [] String "mscorlib" "System.Console" "ReadLine" []
, call [] Int32 "" "int32" "Parse" [String]
, stloc 0
, call [] (ValueType "mscorlib" "System.DateTime") "mscorlib" "System.DateTime" "get_Now" []
, stloc 1
, ldloca 1
, ldloc 0
, neg
, call [CcInstance] (ValueType "mscorlib" "System.DateTime") "mscorlib" "System.DateTime" "AddYears" [Int32]
, stloc 2
, ldstr "This year, you turn {0}."
, ldloca 2
, call [CcInstance] Int32 "mscorlib" "System.DateTime" "get_Year" []
, box Int32
, call [] Void "mscorlib" "System.Console" "WriteLine" [String, Object]
, ret
]

to:

dateTime = ValueType "mscorlib" "System.DateTime" -- just for abbreviation

ioAge :: MethodDef
ioAge = Method [MaStatic, MaPublic] Void "ioAge" []
$ do
      maxStack 11 -- this stays for now until the analysis to do it is done
      -- you could keep the same localsInit expression if you wanted to, this is just to demonstrate how it would be done in the case where the code generator is not in a position to assign all the names at once or isn't concerned with names

      x <- freshLocal Int32
      d1 <- freshLocal (dateTime)
      d2 <- freshLocal (dateTime)

      ldstr "What year were you born?"
      call [] Void "mscorlib" "System.Console" "WriteLine" [String]
      call [] String "mscorlib" "System.Console" "ReadLine" []
      call [] Int32 "" "int32" "Parse" [String]
      stloc x
      call [] dateTime "mscorlib" "System.DateTime" "get_Now" []
      stloc d1
      ldloca d1
      ldloc x
      neg
      call [CcInstance] dateTime "mscorlib" "System.DateTime" "AddYears" [Int32]
      stloc d2
      ldstr "This year, you turn {0}."
      ldloca d2
      call [CcInstance] Int32 "mscorlib" "System.DateTime" "get_Year" []
      box Int32
      call [] Void "mscorlib" "System.Console" "WriteLine" [String, Object]
      ret

With appropriate supporting code, sketched below:

data MethodBuilderState = MState { instructions :: [MethodDecl], nextLocal :: Int,  ... } -- I have a few ideas about the ..., omitted for simplicity

import Control.Monad.State

type MethodBuilder a = State MethodBuilderState a

initialState :: MethodBuilderState
initialState = MState { instructions = [], nextLocal = 0, ... }

buildMethod :: MethodBuilder a -> [MethodDecl]
buildMethod = runState initialState -- not quite, run the maxstack analysis and add the result, add the .localsinit for whatever freshLocals may have been made, etc., but you get the idea

freshLocal :: MethodBuilder Offset -- or could be a name with a possible unique-ifying suffix, or could be one variant for each
freshLocal = do
                      result <- getNextLocal
                      modify (\s -> s { nextLocal = result + 1 })
                      return result

append :: MethodDecl -> MethodBuilder ()
append instr = modify (\s -> s { instructions = (instructions s) ++ [instr] }) -- obviously it would be better for performance to store a backwards list of instructions since we only snoc and never cons, presented forwards to get the idea across

and with appropriate changes to the building methods, e.g. from:

bgt :: Label -> MethodDecl
bgt = mdecl . Bgt

to:

bot :: Label -> MethodBuilder ()
bot = append $ mdecl . Bgt

(Potentially for customizability you might want to define MethodBuilder as a class instead of a type, make append a function in that class, change the builder type of the example from Label -> MethodBuilder () to (MethodBuilder m) => Label -> m (), etc. This would allow people to use a more feature-rich method building monad if their backend had some reason to do so. Ignore that complexity for now.)

tl;dr The monadic version both looks a lot more like the IL than the [MethodDecl] version and provides a place to hang a few other things that practical code-generators are going to need.

tomlokhorst commented 13 years ago

Hmmm...

I've a bit of an aversion to monads, so I guess I'm biased. I've done a tiny bit of work with the GHC internals (20+ year old codebase) and worked extensively on UHC (5+ years old). Both of those use monad stacks internally, which I found very hard to use. I guess for people experienced with the code base its easier, but for someone not familiar with the code, it looks like a big, unnavigable, mess.

I really like the simplicity of the idea that a Method is a name, parameters and a list of instructions. Easy to understand, even for newcomers.

One of my main concerns is keeping the library simple for newcomers, I don't want to scare people off (which I personally have been from a couple of complicated looking hackage libraries).

However, you're probably right, that extra stuff like fresh name generation and automatic maxstack calculation is useful for more experienced users of the library. So adding some alternative way of building a [MethodDecl] might be useful.

My first suggestion would be to add a module Language.Cil.Build.MethodBuilder (probably needs a better name). That could be useful for people needing more complex build functionality and more automation. This could expose one or more functions that generate a [MethodDecl]. Stuff like maxstack calculation could also be here.

Any thoughts?

tomlokhorst commented 13 years ago

Also, in the current design, it is probabily a good idea to add stlocNm :: LocalName -> OpCode.

Then I'd implement the first example as: ioAge :: MethodDef ioAge = Method [MaStatic, MaPublic] Void "ioAge" [] [ localsInit [ Local Int32 x , Local (ValueType "mscorlib" "System.DateTime") d1 ] , call [] String "mscorlib" "System.Console" "ReadLine" [] , call [] Int32 "" "int32" "Parse" [String] , stlocNm x , call "mscorlib" "System.DateTime" "get_Now" [] , stlocNm d1 , ret ] where x :: LocalName x = "x" d1 :: LocalName d1 = "d1"

Not as useful for a code generator where you want to convenience of automatic unique names. But very useful when you want to generate nice looking IL (which I do, because I read it a lot).

dmcclean commented 13 years ago

Is that really simpler? I think the local declarations are a fair bit easier to read in the monadic version, and you don't have to manage the commas, but I may be mistaken.

We could do both as you suggest, but we would either need two copies of all the opcode-named building functions or we would need to introduce

class HasOpCodes m where
    injectOpCode :: OpCode -> m

instance HasOpCodes OpCode where
    injectOpCode = id

instance HasOpCodes MethodDecl where
    injectOpCode = mdecl

instance HasOpCodes (MethodBuilder ()) where
    injectOpCode = append . mdecl
tomlokhorst commented 13 years ago

I have absolutely no problem with the commas, this is no Lisp, but I'm sure every Haskell programmer knows how lists work. I think the conceptual overhead of the monad with the return, bind and (>>) (what's that operator called?), is higher than the visual overhead of a couple of commas.

For now, I'm in favour of duplicating the opcode builder functions in the MethodBuilder module. I'd like to try out both versions in a practical setting. If it turns out they're both useful we can think of abstracting away the commonalities. If on the other hand the [MethodDecl] version turns out to be unnecessary, those builders can be completely removed in favour of the MethodBuilder monad.

Ultimately its of course best if there is no duplication, but for now I think it best that the MethodBuilder module is an addition. An addition that can or can't be used without impacting the rest of the code. I don't want to fall in the trap of premature abstraction.

dmcclean commented 13 years ago

Sounds like a good plan. I have a side-by-side implementation that works. Later this evening I will do some renaming and pass it along.