tomlokhorst / language-cil

Manipulating Common Intermediate Language AST in Haskell
Other
20 stars 8 forks source link

Invented opcodes for named access to locals/arguments #10

Open dmcclean opened 13 years ago

dmcclean commented 13 years ago

There are a few invented opcodes in the Opcode type, which might not be the best thing to have if we want to parse CIL because the parser would have to case-analyze what it is parsing to build the AST.

Examples:

data OpCode
  = ...
   | LdargN DottedName
   | LdlocN DottedName
   | LdlocaN DottedName
   | StlocN DottedName
   -- CIL also defines a family of starg opcodes which we don't have yet (there's bit of a philosophical issue with "storing" to your own arguments, but it does exist), for symmetry we would presumably create a StargN DottedName alternative

I propose:

data Location = Offset Offset
                     | Name DottedName -- or is it LocalName, I'm confused about the necessity of DottedName

data OpCode
  = ...
  | Ldarg Location
  | Ldarga Location -- not implemented yet, but a trivial extension to implement
  | Ldloc Location
  | Ldloca Location
  | Starg Location
  | Stloc Location

The ldargN, ldlocN, ldlocaN, stlocN helpers in Build can stick around, or we could play typeclass tricks to let you write:

[ ldc_i4 42
, dup
, stloc 2
, stloc "answer"
]

as:

class IsLocation t where -- kinda unfortunate you can't have both a type named location and a class named location, this may not be the best name
    toLocation :: t -> Location

instance IsLocation Offset where
    toLocation = Offset

instance IsLocation DottedName where  -- (or is it LocalName, or both?)
    toLocation = Name

ldloc :: (IsLocation loc) => loc -> MethodDecl
ldloc loc = mdecl $ Ldloc $ toLocation loc

-- and so forth...

I'm not 100% sure the type class hackery is really any better than the one where we keep the ldlocN, ldargN, ... family of helpers. It improves readability of the code, it maybe makes it easier on some beginners (less to remember, guessing ldloc "x" works for those who guess that, guessing ldloc 0 works for those who guess that) but if a beginner makes a mistake the type error message will be more confusing.

tomlokhorst commented 13 years ago

I'm not sure I understand the problem.

What to you mean by "parse CIL"? Parsing a binary, or parsing the textual representation of CIL?

In the case of parsing a binary, the parser can always chose the Ldloc version. A parser for textual CIL, can make the choice based on what's there in the text. The named LdlocN may not be an instruction in the binary serialization of CIL, but it is a feature of ILAsm, and part of de AST for textual CIL.

dmcclean commented 13 years ago

Yeah, I suppose it isn't that bad. Still, it seems more in line with the rest of things to have one alternative in OpCode (avoids duplicating analysis code) and two builder methods.

tomlokhorst commented 13 years ago

Ah, that's a good point about duplicating analysis code. I agree with adding the Location data type, and keeping the two builder functions.

I'm not a fan of type classes for these sort of things. This isn't some very general concept (like Eq), and the type errors are indeed confusing.

dmcclean commented 13 years ago

Agreed. I prototyped the type class approach to this problem and it doesn't work out well at all. It works sort-of-ok with the GHC overloaded-string-literals extension, but at a cost of worse ugliness. I'm all for scrapping that guess.