microvm / microvm-meta

We have moved: https://gitlab.anu.edu.au/mu/general-issue-tracker
https://gitlab.anu.edu.au/mu/general-issue-tracker
3 stars 0 forks source link

Sizeof? #47

Open eliotmoss opened 9 years ago

eliotmoss commented 9 years ago

We have encountered an interesting issue in developing the C client, namely how to deal with union types. Our thought was to define a separate struct type for each union variant, and then to cast to the appropriate struct type when accessing a particular variant. (Note that this requires structs to be heap or alloca allocated, which I think is ok -- C does not treat them as single values that can go into a register, etc., as I recall.)

The problem we have is that because Mu defines the detailed layout of a struct on a given target, we cannot determine the sizes of the structs, and thus we cannot determine the maximum size, something we need in order to allocate an instance of a union type.

We observe that Mu gives no way to as the size of a type (or to get the offset of a field in a struct or an element of an array). While such information may not be used for typical accesses, we now see that it has at least one important use case. Given that C programs are typical way-ahead-of-time compiled, we do not consider it appropriate to generate Mu for C code only at the last minute.

We suggest that Mu provide means to determine sizes and perhaps to do simple load-time (if that is the right word) computations over these constants. Here is some possible syntax (admitting that I have not thought about it long or deeply yet):

.sizeof name type

Define name to be the constant that is the number of bytes needed for type.

.sizeof name op t1 t2 ... tn

Define name to be the sizes of t1 through tn combined with operator op, where op can be at least max and sum.

Alternatively, we could define names for the sizes of each type, and a more general constant-computing form:

.define name op e1 ... en

This would define name to be op applied to the ei. We could provide a suitable range of operators.

For offsets we could have:

.offset name struct or array type idx

This would define name to be the constant giving the offset of the idx'th field/element of the given struct or array type.

The point is to allow target-dependent computations over constants to be written in a target independent (symbolic) way. I believe this would meet the needs of C.

wks commented 9 years ago

Typical off-the-shelf C compilers (gcc, clang, ...) determines the platform at compile time (-march, -mcpu, -m32, -m64, ...). If these options are given, all sizeof, alignof and offsetof expressions become compile-time constants. Then the question becomes whether the C client for Mu is supposed to behave like traditional C compilers, or an novel C compiler that generates cross-platform Mu IR code.

For C clients as traditional C compilers, although Mu does not provide IR or API to determine the sizes/alignments/offsets, the knowledge about the platform (and the ABI) can be used to determine those values at compile time.

I think the real interesting part is to let the C client generate cross-platform Mu IR codes. These new directives will be helpful. They may not necessarily become part of the Mu IR because they are basically constants, but some client-level library can rewrite these directives into constant definitions (.const). In practice, the same "Mu IR interceptor library" can also perform preSSA-to-SSA conversions as mention in https://github.com/microvm/microvm-meta/issues/44 With the library helping at run time, the generated code from the C client will still be portable.

Example:

.sizeof @SIZE_OF_DOUBLE @double will become .const @SIZE_OF_DOUBLE <@i64> = 8.

A slightly more complex case:

.typedef @SmallStruct = struct<@PtrT @FuncPtrT>
.typedef @BigStruct = struct<@i8 @i16 @i32 @i64 @float @double>
.sizeof @SIZE_OF_SMALL_STRUCT @SmallStruct
.sizeof @SIZE_OF_BIG_STRUCT @BigStruct
.sizeof @SIZE_OF_MAX_STRUCT max @SIZE_OF_SMALL_STRUCT @SIZE_OF_BIG_STRUCT
.typedef @Union = array<@i8 @SIZE_OF_MAX_STRUCT>

will be rewritten into

.typedef @SmallStruct <@PtrT @FuncPtrT>
.typedef @BigStruct <@i8 @i16 @i32 @i64 @float @double>
.const @SIZE_OF_SMALL_STRUCT <@i64> = 16 // assume 8-byte pointers
.const @SIZE_OF_BIG_STRUCT <@i64> = 32 // assume all fields are aligned
.const @SIZE_OF_MAX_STRUCT <@i64> = 32
.typedef @Union = array<@i8 32>

So the program can allocate the union by %obj = NEW <@Union> and it will have the appropriate size.

I will think more into it. A list of platform-dependent constant values or rules will be very helpful.