swp-uebersetzerbau-ss13 / common

Shared files between teams.
1 stars 0 forks source link

Three Address Code Specification #2

Closed fub-frank closed 11 years ago

fub-frank commented 11 years ago

Comments and Discussion for Three Address Code Specification See Wiki page: Three Address Code Specification

ghost commented 11 years ago

This is a proposal as to how derived types (such as arrays and structs) could be represented in the TAC. Comments, criticism and improvements are welcome:

Operators

DECLARE

This declares a variable of a primitive or derived type with the type as the first argument and the result being an identifier this variable will be bound to.

The identifier is mandatory for primitive types (with one exception, see further down at arrays) and structs, but optional for arrays as they may be chained together in the TAC to create multidimensional arrays.

If an array's DECLARE operation does not have an identifier is must be followed by a non-empty chain of array DECLAREs, which must all have no identifier, except for the last one, which must have an identifier. Furthermore it is necessary that the first array DECLARE in any non-empty chain of array DECLAREs must be preceded by a DECLARE operation for a primitive type that has no identifier. This preceding DECLARE operation determines the type of the array's elements; if such a DECLARE operation has an initialiser all elements of the array will be initialised to that value (including multidimensional arrays).

The meaning for the second argument depends on whether a primitive or a derived type is being declared:

SGET

This will get the value of one of a struct's members:

AGET

This will get the value of one of an array's elements:

SREF

This will create a reference to a struct's member:

AREF

This will create a reference to an array's element:

SSET

This will set the value of a struct's member:

ASET

This will set the value of an array's element

Examples

These examples' source language is a C-similar-style for now, can be replaced by code in the project's source language if needed.

struct A { int foo; float bar; }
Operator Argument 1 Argument 2 Result
DECLARE float "bar"
DECLARE int "foo"
DECLARE struct 2 "A"
float[4][2] B;
Operator Argument 1 Argument 2 Result
DECLARE float
DECLARE array 4
DECLARE array 2 "B"
int a = A.foo;
Operator Argument 1 Argument 2 Result
DECLARE int "a"
SGET "A" "foo" "a"
float b1 = B[1][3];
Operator Argument 1 Argument 2 Result
DECLARE float "b1"
AREF "B" 1 "t1"
AGET "t1" 3 "b1"
float[4] b2 = B[1];
Operator Argument 1 Argument 2 Result
DECLARE float
DECLARE array 4 "b2"
AGET "B" 1 "b2"
int x = 5; B[1][3] = x;
Operator Argument 1 Argument 2 Result
DECLARE int 5 "x"
AREF "B" 1 "t1"
AREF "t1" 3 "t2"
ASET "t2" "x"
float[4] x = [3,2,1,0]; B[1] = x;
Operator Argument 1 Argument 2 Result
DECLARE float
DECLARE array 4 "x"
AREF "x" 0 "t1"
ASET "t1" 3
AREF "x" 1 "t2"
ASET "t2" 2
AREF "x" 2 "t3"
ASET "t3" 1
AREF "x" 3 "t4"
ASET "t4" 0
AREF "B" 1 "t5"
ASET "t5" x
float x = 0.0; A.bar = x;
Operator Argument 1 Argument 2 Result
DECLARE float 0.0 "x"
SREF "A" "bar" "t1"
SSET "t1" "x"
fub-frank commented 11 years ago

This is a good idea. With this syntax we can create typed IR code. This eliminates the need to pass the symbol table to the Code Generator. TAC is therefore complete by itself and separates frontend and backend.

Also typed IR is nice!

jvf commented 11 years ago

Passing the symbol table to the backend would violate the modularization requirements in my opinion. I'm therefore in favor of the second (typed) specification. This would make generating the LLVM IR (as target language) much easier as well.

namor-swp13 commented 11 years ago

On Tue, Apr 16, 2013 at 11:41:10AM -0700, jvf wrote:

Passing the symbol table to the backend would violate the modularization requirements in my opinion. I'm therefore in favor of the second (typed) specification. This would make generating the LLVM IR (as target language) much easier as well.

I agree. Also, note that LLVM uses indices to address struct elements. It would require bookkeeping by the backend if TAC still contains names.

AFAIR we only have to specify the TAC so that the features of milestone#1 are set. I suggest we focus on those for now. Arrays and structs are not part of milestone #1.

ghost commented 11 years ago

Also, note that LLVM uses indices to address struct elements. It would require bookkeeping by the backend if TAC still contains names.

If we don't want to do bookkeeping in the backend making SGET and SREF use indices rather than names is not a big change. The primitives' DECLAREs could probably be stripped of the names as well (if uses for a struct), but I'd rather keep them there to make the TAC be more expressive. If we do want to do bookkeeping it's not much of a problem, we'd need one hashmap with key: struct identifier -> value: member identifier list (in exact reverse order of occurrence above the struct's DECLARE operation). Personally I'd prefer the bookkeeping variant as that makes the TAC more humanly readable.

AFAIR we only have to specify the TAC so that the features of milestone#1 are set. I suggest we focus on those for now. Arrays and structs are not part of milestone #1.

You're right, of course. I just didn't want use choosing one method to declare primitive types and later run into a problem with that choice when we have to add derived types, which is why I'd like to at least define the parts about the primitives as I did in the proposal - if we do that we can worry about the derives types later, hopefully without running into nasty surprises.

slomo commented 11 years ago

I question wether a three adress code representation needs something highlevel as structs.

Aside this general question the defintion of sget seems to problematic to me:

SGET

This will get the value of one of a struct's members:

Argument 1: The struct's identifier Argument 2: The member's identifier Result: Where to store that member's value

What is the value if the referenced member is a struct by it self? I would rather work on addresses if such an operator is introduced.

Two other random thoughts: What is the advantage in contrast to tripels, and wouldn't it be nicer to pull the types closer to the operations (rather than the names). Than you would have to do less typchecking, conversion stuff in the backend.

To end with something positiv: I like the activity here and the format looks promising to me.

fub-frank commented 11 years ago

I have added Three Address Code Specification for milestone 1 (Arithmetic) on the wiki page: Milestone 1 Three Address Code Specification for Arithmetic

ghost commented 11 years ago

What is the value if the referenced member is a struct by it self?

A copy of that struct the same way as in the "float[4] b2 = B[1];" example b2 gets a copy of the array at index 1 in B. I'm not yet seeing a problem there, as we don't have pointers, so a normal copy would automatically be a deep copy of the entire struct's allocated memory.

I would rather work on addresses if such an operator is introduced.

In that case we'd need to recalculate the addresses in the TAC back to element/struct indices for generation of target code, essentially calculating information that the frontend already knew about but we chose to "forget". Also afaik the TAC is going to be used in lectures and it would be better to make it more human-readable imho. For working with pure addresses there's already assembler, I'd prefer not to recreate that in an environment where we don't have to.

What is the advantage in contrast to triples

According to the dragon book they're essentially equivalent notations. The main difference being that triples don't have a result field and you have to work more with temporary variables, which imho makes it harder to read, which is why I prefer the quadruple notation for its better (imo) readability.

and wouldn't it be nicer to pull the types closer to the operations (rather than the names)

I totally agree with you there, we could do ageti, agetf, ageta, agets, sgeti, sgetf, sgets, sgeta, etc. for that (each for one type). I'll update the proposal tomorrow after thinking about possible problems with that.

ghost commented 11 years ago

Updated proposal:

Operators

DECLARE

This declares a variable of a primitive, derived or reference type with the type as the first argument and the result being an identifier this variable will be bound to.

Identifier

The identifier is mandatory for primitive types and records (with one exception, see further down at arrays), but optional for arrays as they may be chained together in the TAC to create multidimensional arrays.

Multidimensional arrays

If an array's DECLARE operation does not have an identifier is must be followed by a non-empty chain of array DECLAREs, which must all have no identifier, except for the last one, which must have an identifier. Furthermore it is necessary that the first array DECLARE in any non-empty chain of array DECLAREs must be preceded by a DECLARE operation for a primitive type without an identifier or a record without an identifier. This preceding DECLARE operation determines the type of the array's elements; if such a DECLARE operation is of a primitive type it may have an initialiser that initialises all elements of the array (including multidimensional arrays)

Nested derived types and the reference type

This is a special type that exists only in the TAC for one reason: To get the values of array elements and record members in the case of them being nested in other derived types. A variable with the reference type may only be used for {ARRAY/RECORD}GET{TYPE} operations.

Example

Problem: To get the value for an element of an array inside a record while being type-safe one needs four addresses - the record's, the array's inside that record, the element's inside that array and where to store the result - so it cannot be accomplished in TAC with a single operation (In this example one could make the result be stored in an accumulator by default and thus only use three addresses for the get operation, but with one more level of nesting that would also become impossible, as well).

Solution: First one gets a reference to the array with the RECORD_GET_REFERENCE operation and stores it in a reference type variable. Now the ARRAYGET{TYPE} operation can be used to get to get the element's value with the array specified as the variable holding the reference.

The second argument

The meaning for the second argument depends on the type:

RECORDGET{TYPE}

This will get the value of one of a record's members.

The type of this operation's result is determined by {TYPE}, which must either match the type of the member that is accessed by this operation or be "reference":

ARRAYGET{TYPE}

This will get the value of one of an array's elements:

The type of this operation's result is determined by {TYPE}, which must either match the type of the element that is accessed by this operation or be "reference":

RECORDSET{TYPE}

This will set the value of a record's member:

The types of the member and the value to set it to must match {TYPE}.

ARRAYSET{TYPE}

This will set the value of an array's element

The types of the element and the value to set it to must match {TYPE}.

Examples

record { num foo; real bar; } A;
Operator Argument 1 Argument 2 Result
DECLARE real "bar"
DECLARE num "foo"
DECLARE record 2 "A"
real[4][2] B;
Operator Argument 1 Argument 2 Result
DECLARE real
DECLARE array 4
DECLARE array 2 "B"
num a; a = A.foo;
Operator Argument 1 Argument 2 Result
DECLARE num "a"
RECORD_GET_NUM "A" "foo" "a"
real b1; b1 = B[1][3];
Operator Argument 1 Argument 2 Result
DECLARE real "b1"
DECLARE reference "t1"
ARRAY_GET_REFERENCE "B" 1 "t1"
ARRAY_GET_REAL "t1" 3 "b1"
real[4] b2; b2 = B[1];
Operator Argument 1 Argument 2 Result
DECLARE real
DECLARE array 4 "b2"
ARRAY_GET_REAL "B" 1 "b2"
real x; x = 5.0; B[1][3] = x;
Operator Argument 1 Argument 2 Result
DECLARE real 5 "x"
DECLARE reference "t1"
ARRAY_GET_REFERENCE "B" 1 "t1"
ARRAY_SET_REAL "t1" 3 "x"
real[4] x; x[0] = 3.0; x[1] = 2.0; x[2] = 1.0; x[3] = 0.0; B[1] = x;
Operator Argument 1 Argument 2 Result
DECLARE real
DECLARE array 4 "x"
ARRAY_SET_REAL "x" 0 3.0
ARRAY_SET_REAL "x" 1 2.0
ARRAY_SET_REAL "x" 2 1.0
ARRAY_SET_REAL "x" 3 0.0
ARRAY_SET_ARRAY "B" 1 "x"
real x = 0.0; A.bar = x;
Operator Argument 1 Argument 2 Result
DECLARE real 0.0 "x"
RECORD_SET_REAL "A" "bar" "x"
record { num foo; record { real x; real y; real z; }[3][10] triangles; real bar; } C;
Operator Argument 1 Argument 2 Result
DECLARE real "bar"
DECLARE real "z"
DECLARE real "y"
DECLARE real "x"
DECLARE record 3
DECLARE array 3
DECLARE array 10 "triangles"
DECLARE num "foo"
DECLARE record 3 "C"
record { real x; real y; real z; } point; point.x = 0.0; point.y = 0.0; point.z = 0.0; C.triangles[4][1] = point;
Operator Argument 1 Argument 2 Result
DECLARE real "z"
DECLARE real "y"
DECLARE real "x"
DECLARE record 3 "point"
RECORD_SET_REAL "point" "x" 0.0
RECORD_SET_REAL "point" "y" 0.0
RECORD_SET_REAL "point" "z" 0.0
DECLARE reference "t1"
RECORD_GET_REFERENCE "C" "triangles" "t1"
DECLARE reference "t2"
ARRAY_GET_REFERENCE "t1" 4 "t2"
ARRAY_SET_RECORD "t2" 1 "point"
marcowallischprinz commented 11 years ago

Hi guys, actually I'm in favor of the typed version as well, since it would facilitate codegen indeed. Nevertheless, as Moritz mentioned, the compiler will be used in lecture next time and imho TAC normally doesn't consist of operations like "declare", as additional information like operands' types is provided by the symbol-tables and doesn't have to be stated in the TAC again. Even though it might be easier for us, I think we should stick to the way being described in the dragonbook and thus hopefully improve the students' understanding during next lecture. Imho this doesn't contradict the modularization requirements. I think, it makes no odds, whether we provide the backend with an unusual TAC or an usual TAC along with a symbol-table. However, if the majority is in favor of doing it that away, I'll be pleased as well, as it probably will make my job easier. ;) BR

marcowallischprinz commented 11 years ago

If you are going to implement type-checking methods, imho some type conversion operators will be interesting, especially in the typed version. BR

ghost commented 11 years ago

If you are going to implement type-checking methods, imho some type conversion operators will be interesting, especially in the typed version.

I'm not sure what you're referring to when you mention type-checks in relation to the TAC, as the TAC the frontend generates should not contain any type errors, but type conversion operators between primitive types (I can't think of any sensible ones between the derived types the source specifies) will be necessary regardless of whether we represent derived types in the TAC or not, I agree.

fub-frank commented 11 years ago

I am still in favour of typed TAC. Advantages were given in this discussion already. The updated proposal looks promising as it is easier to use than proposal 1 and needs 2 operators less.

If you are going to implement type-checking methods, imho some type conversion operators will be interesting, especially in the typed version.

TAC is correct by definition. Tac should not contain any errors. Please see issue #1 on this.

but type conversion operators between primitive type will be necessary

Currently type conversion can occur between type num and type real (regarding TAC for Milestone 1 only). For a discussion on these types, please see issue #8.

marcowallischprinz commented 11 years ago

I'm not sure what you're referring to when you mention type-checks in relation to the TAC

I thought, that type-checking is part of the intermediate code generator and of course it should be done before generating the TAC, but as TAC is correct by definition, the question is answered implicitly... Sorry, maybe was a dumb question...

jvf commented 11 years ago

(this comment refers to the now unified proposal by the fuc backend group, which can be found in the wiki. I myself am part of the fuc backend group)

Two remarks: 1) Concerning the java interface Quadruple(): I think it is bade style to return null as a (intended) value of a function, therefore i would suggest we use a special case object instead, which here could be a String "NOTUSED" for instance. 2) I added the default values in the remarks of the `DECLARE{TYPE}` statements. I know it is redundant, but redundancy can be good for communication ;-)

ghost commented 11 years ago

1) Concerning the java interface Quadruple(): I think it is bade style to return null as a (intended) value of a function, therefore i would suggest we use a special case object instead, which here could be a String "NOT_USED" for instance.

I don't agree with it being bad style as no functions actually returns null, only the Quadruples may contain null as one of their fiels, but I hae no problem with there being an extra string value instead of null. Hack away^^

2) I added the default values in the remarks of the DECLARE_{TYPE} statements. I know it is redundant, but redundancy can be good for communication ;-)

Great, thanks. If you find more there that can be clearer please polish it up^^

jvf commented 11 years ago

I don't agree with it being bad style as no functions actually returns null, only the Quadruples may contain null as one of their fiels, but I hae no problem with there being an extra string value instead of null. Hack away^^

I would say it should be either `undefined' (if we are not going to check it) or a special case value, if we decide to check for well-formedness of quadruples for instance in tests.

ghost commented 11 years ago

I would say it should be either `undefined' (if we are not going to check it) or a special case value, if we decide to check for well-formedness of quadruples for instance in tests.

The value is now specified in the backend interface as Quadruple.EmptyArgument, which is a constant evaluating to "!".

ghost commented 11 years ago

I've updated the wiki with proposed additions for the features that are a requirement of milestone 2: https://github.com/swp-uebersetzerbau-ss13/common/wiki/Three-Address-Code-Specification

PS: A proposal for arrays/structs will follow.

UPDATE: Arrays have been added, struct-proposal postponed until Milestone 3. If there are no problems with this, the interface files will be updated this sunday (at which point the "proposed" mark will be also be removed from the wiki entries).

ghost commented 11 years ago

The interface files have now been updated and the wiki "proposed" marks have been removed.

ghost commented 11 years ago

The specification has been updated for milestone 3; comments, etc. should go here.

Changelog:

ghost commented 11 years ago

As no one seems to have a problem with, or suggestions for the milestone 3 TAC specification, it has now been merged (from tac_ms3 branch) into master.