A better AssemblyName / TypeName

tomlokhorst commented 13 years ago

Copied from issue 2:

The only other one I need in the near term is ldtoken, but it's a bit thorny because the current way of representing type (and hence method, and even field) names is incomplete. Having an AssemblyName and a TypeName like there are in a bunch of places right now gets you the common cases, but there are a bunch more: array types (single and multi-dimensional), pointer types, and nested types. Compare the grammar in Syntax.PrimitiveType with the grammar in Partition II, section 7.1.

It seems like many of the places that presently have an AssemblyName together with a TypeName (and possibly a method or field after that) need to instead be a slightly-beefed-up PrimitiveType (which isn't really so primitive, since it includes all the cases and not just the simple pre-defined ones). The ldtoken support I need isn't useful to me without that feature, but since it is a pretty disruptive change I was wondering where you were planning on going with it to decide if I should just go do it or if you'd like me to contribute it back.

(I'm fine with ignoring the pinned, managed pointer, typedref, modopt, and modreq productions in II, 7.1. They cover obscure things for which I have no need, and especially no short-term need. I also don't actually need the field flavor of ldtoken instructions, but after I've done the other two it's very little extra work.)

Examples for concreteness: valuetype [mscorlib]System.Environment/SpecialFolder class [mscorlib]System.Predicate[]

tomlokhorst commented 13 years ago

I too noticed AssemblyName and TypeName are together a lot.

However, I want to keep the external API as simple as possible. I.e. calling a function with two strings is a lot simpler (and nicer to read), than calling some constructor to build some "NameType".

I'd rather have some redundancy internally in the library, if that means the external API is nicer.

Having said that, maybe all the "niceties" can be done in the build functions. If you have a suggestion on how to implement this, I'd love to read it.

Also, I will read section 7.1, I haven't yet.

dmcclean commented 13 years ago

I do have a best-of-both-worlds proposal. I'll do a draft when I get back to a full-sized keyboard.

dmcclean commented 13 years ago

OK, the reason why I don't want to adopt the strings-only approach is because the backend I am writing wants to be able to combine a few types into a bigger one, and it's better to centralize the logic for how that affects the serialized name construction.

I propose:

class TypeDescriptor t where
    toType :: t -> PrimitiveType -- setting aside whether PrimitiveType is the right name

instance TypeDescriptor String where
    toType = parseTypeName -- write a parsec parser that does this, useful anyway especially if you are planning to have a parser for IL in general (I don't need one, so I'm not especially concerned about that). the overall IL grammar is actually pretty complicated, but the subset concerned with type names is really not bad

instance TypeDescriptor (AssemblyName, TypeName) where
    toType (an, tn) = ... -- it's actually not clear to me that this is even valid, for example how do you know when it is a class and when it is a valuetype? but if there is some strategy you have in mind you can keep it and just describe it here

instance TypeDescriptor PrimitiveType where 
    toType = id

Now all the Build functions would be changed, just to pick an example:

box :: PrimitiveType -> MethodDecl
box = mdecl . Box

becomes

box :: (TypeDescriptor t) => t -> MethodDecl
box = mdecl . Box . toType

I would argue that this is actually better from a simplicity-of-the-easy-cases-and-for-new-users perspective, because now you can write:

ldc_i4 42
box "int32"

or

ldc_i4 42
box "[mscorlib]System.Int32"

or however else you want to say it, and it reads just like the resulting IL will (except with quotes around it).

Good? Bad? How can it be even better?

[Premature optimization jungle:] The performance penalty that you will pay for getting your pre-stringified type names washed through the parser and the pretty printer is the price of an understandable error message when something is screwy instead of probably-more-confusing one from ilasm later. If that performance is the ultimate concern you could:

data PrimitiveType = ... | OpaqueTypeName String

instance TypeDescriptor String where
    toType = OpaqueTypeName

-- obvious trivial extension to pretty printer

tomlokhorst / language-cil

A better AssemblyName / TypeName #5