Three Address Code Specification

fub-frank commented 11 years ago

Comments and Discussion for Three Address Code Specification See Wiki page: Three Address Code Specification

ghost commented 11 years ago

This is a proposal as to how derived types (such as arrays and structs) could be represented in the TAC. Comments, criticism and improvements are welcome:

Operators

DECLARE

This declares a variable of a primitive or derived type with the type as the first argument and the result being an identifier this variable will be bound to.

The identifier is mandatory for primitive types (with one exception, see further down at arrays) and structs, but optional for arrays as they may be chained together in the TAC to create multidimensional arrays.

If an array's DECLARE operation does not have an identifier is must be followed by a non-empty chain of array DECLAREs, which must all have no identifier, except for the last one, which must have an identifier. Furthermore it is necessary that the first array DECLARE in any non-empty chain of array DECLAREs must be preceded by a DECLARE operation for a primitive type that has no identifier. This preceding DECLARE operation determines the type of the array's elements; if such a DECLARE operation has an initialiser all elements of the array will be initialised to that value (including multidimensional arrays).

The meaning for the second argument depends on whether a primitive or a derived type is being declared:

primitive - the second argument is an optional initialiser for the variable
structs: the second argument defines how many of the variables bound to identifiers in a direct, uninterrupted chain of DECLARE operations above the struct's own DECLARE become part of the struct (i.e. are moved from the global scope into the struct's scope)
arrays: the second argument defines the number of elements the array will have

SGET

This will get the value of one of a struct's members:

Argument 1: The struct's identifier
Argument 2: The member's identifier
Result: Where to store that member's value

AGET

This will get the value of one of an array's elements:

Argument 1: The array's identifier
Argument 2: The element's index
Result: Where to store that element's value

SREF

This will create a reference to a struct's member:

Argument 1: The struct's identifier
Argument 2: The member's identifier
Result: Where to store the reference

AREF

This will create a reference to an array's element:

Argument 1: The array's identifier
Argument 2: The element's index
Result: Where to store the reference

SSET

This will set the value of a struct's member:

Argument 1: The struct's member's reference (created with SREF)
Argument 2: The value to set the member to
Result: This operation has no result

ASET

This will set the value of an array's element

Argument 1: The array's element's reference (created with AREF)
Argument 2: The value to set the element to
Result: This operation has no result

Examples

These examples' source language is a C-similar-style for now, can be replaced by code in the project's source language if needed.

struct A { int foo; float bar; }

Operator	Argument 1	Argument 2	Result
DECLARE	float		"bar"
DECLARE	int		"foo"
DECLARE	struct	2	"A"

float[4][2] B;

Operator	Argument 1	Argument 2	Result
DECLARE	float
DECLARE	array	4
DECLARE	array	2	"B"

int a = A.foo;

Operator	Argument 1	Argument 2	Result
DECLARE	int		"a"
SGET	"A"	"foo"	"a"

float b1 = B[1][3];

Operator	Argument 1	Argument 2	Result
DECLARE	float		"b1"
AREF	"B"	1	"t1"
AGET	"t1"	3	"b1"

float[4] b2 = B[1];

Operator	Argument 1	Argument 2	Result
DECLARE	float
DECLARE	array	4	"b2"
AGET	"B"	1	"b2"

int x = 5; B[1][3] = x;

Operator	Argument 1	Argument 2	Result
DECLARE	int	5	"x"
AREF	"B"	1	"t1"
AREF	"t1"	3	"t2"
ASET	"t2"	"x"

float[4] x = [3,2,1,0]; B[1] = x;

Operator	Argument 1	Argument 2	Result
DECLARE	float
DECLARE	array	4	"x"
AREF	"x"	0	"t1"
ASET	"t1"	3
AREF	"x"	1	"t2"
ASET	"t2"	2
AREF	"x"	2	"t3"
ASET	"t3"	1
AREF	"x"	3	"t4"
ASET	"t4"	0
AREF	"B"	1	"t5"
ASET	"t5"	x

float x = 0.0; A.bar = x;

Operator	Argument 1	Argument 2	Result
DECLARE	float	0.0	"x"
SREF	"A"	"bar"	"t1"
SSET	"t1"	"x"

fub-frank commented 11 years ago

This is a good idea. With this syntax we can create typed IR code. This eliminates the need to pass the symbol table to the Code Generator. TAC is therefore complete by itself and separates frontend and backend.

Also typed IR is nice!

jvf commented 11 years ago

Passing the symbol table to the backend would violate the modularization requirements in my opinion. I'm therefore in favor of the second (typed) specification. This would make generating the LLVM IR (as target language) much easier as well.

namor-swp13 commented 11 years ago

On Tue, Apr 16, 2013 at 11:41:10AM -0700, jvf wrote:

Passing the symbol table to the backend would violate the modularization requirements in my opinion. I'm therefore in favor of the second (typed) specification. This would make generating the LLVM IR (as target language) much easier as well.

I agree. Also, note that LLVM uses indices to address struct elements. It would require bookkeeping by the backend if TAC still contains names.

AFAIR we only have to specify the TAC so that the features of milestone#1 are set. I suggest we focus on those for now. Arrays and structs are not part of milestone #1.

Roman

ghost commented 11 years ago

Also, note that LLVM uses indices to address struct elements. It would require bookkeeping by the backend if TAC still contains names.

If we don't want to do bookkeeping in the backend making SGET and SREF use indices rather than names is not a big change. The primitives' DECLAREs could probably be stripped of the names as well (if uses for a struct), but I'd rather keep them there to make the TAC be more expressive. If we do want to do bookkeeping it's not much of a problem, we'd need one hashmap with key: struct identifier -> value: member identifier list (in exact reverse order of occurrence above the struct's DECLARE operation). Personally I'd prefer the bookkeeping variant as that makes the TAC more humanly readable.

AFAIR we only have to specify the TAC so that the features of milestone#1 are set. I suggest we focus on those for now. Arrays and structs are not part of milestone #1.

You're right, of course. I just didn't want use choosing one method to declare primitive types and later run into a problem with that choice when we have to add derived types, which is why I'd like to at least define the parts about the primitives as I did in the proposal - if we do that we can worry about the derives types later, hopefully without running into nasty surprises.

slomo commented 11 years ago

I question wether a three adress code representation needs something highlevel as structs.

Aside this general question the defintion of sget seems to problematic to me:

SGET

This will get the value of one of a struct's members:

Argument 1: The struct's identifier Argument 2: The member's identifier Result: Where to store that member's value

What is the value if the referenced member is a struct by it self? I would rather work on addresses if such an operator is introduced.

Two other random thoughts: What is the advantage in contrast to tripels, and wouldn't it be nicer to pull the types closer to the operations (rather than the names). Than you would have to do less typchecking, conversion stuff in the backend.

To end with something positiv: I like the activity here and the format looks promising to me.

fub-frank commented 11 years ago

I have added Three Address Code Specification for milestone 1 (Arithmetic) on the wiki page: Milestone 1 Three Address Code Specification for Arithmetic

ghost commented 11 years ago

What is the value if the referenced member is a struct by it self?

A copy of that struct the same way as in the "float[4] b2 = B[1];" example b2 gets a copy of the array at index 1 in B. I'm not yet seeing a problem there, as we don't have pointers, so a normal copy would automatically be a deep copy of the entire struct's allocated memory.

I would rather work on addresses if such an operator is introduced.

In that case we'd need to recalculate the addresses in the TAC back to element/struct indices for generation of target code, essentially calculating information that the frontend already knew about but we chose to "forget". Also afaik the TAC is going to be used in lectures and it would be better to make it more human-readable imho. For working with pure addresses there's already assembler, I'd prefer not to recreate that in an environment where we don't have to.

What is the advantage in contrast to triples

According to the dragon book they're essentially equivalent notations. The main difference being that triples don't have a result field and you have to work more with temporary variables, which imho makes it harder to read, which is why I prefer the quadruple notation for its better (imo) readability.

and wouldn't it be nicer to pull the types closer to the operations (rather than the names)

I totally agree with you there, we could do ageti, agetf, ageta, agets, sgeti, sgetf, sgets, sgeta, etc. for that (each for one type). I'll update the proposal tomorrow after thinking about possible problems with that.

ghost commented 11 years ago

Updated proposal:

Operators

DECLARE

This declares a variable of a primitive, derived or reference type with the type as the first argument and the result being an identifier this variable will be bound to.

Identifier

The identifier is mandatory for primitive types and records (with one exception, see further down at arrays), but optional for arrays as they may be chained together in the TAC to create multidimensional arrays.

Multidimensional arrays

If an array's DECLARE operation does not have an identifier is must be followed by a non-empty chain of array DECLAREs, which must all have no identifier, except for the last one, which must have an identifier. Furthermore it is necessary that the first array DECLARE in any non-empty chain of array DECLAREs must be preceded by a DECLARE operation for a primitive type without an identifier or a record without an identifier. This preceding DECLARE operation determines the type of the array's elements; if such a DECLARE operation is of a primitive type it may have an initialiser that initialises all elements of the array (including multidimensional arrays)

Nested derived types and the reference type

This is a special type that exists only in the TAC for one reason: To get the values of array elements and record members in the case of them being nested in other derived types. A variable with the reference type may only be used for {ARRAY/RECORD}GET{TYPE} operations.

Example

Problem: To get the value for an element of an array inside a record while being type-safe one needs four addresses - the record's, the array's inside that record, the element's inside that array and where to store the result - so it cannot be accomplished in TAC with a single operation (In this example one could make the result be stored in an accumulator by default and thus only use three addresses for the get operation, but with one more level of nesting that would also become impossible, as well).

Solution: First one gets a reference to the array with the RECORD_GET_REFERENCE operation and stores it in a reference type variable. Now the ARRAYGET{TYPE} operation can be used to get to get the element's value with the array specified as the variable holding the reference.

The second argument

The meaning for the second argument depends on the type:

primitive: the second argument is an optional initialiser for the variable
records: the second argument defines how many of the variables bound to identifiers in a direct, uninterrupted chain of DECLARE operations above the record's own DECLARE become part of the record (i.e. are moved from the global scope into the record's scope)
arrays: the second argument defines the number of elements the array will have
reference: the second argument must not be present

RECORDGET{TYPE}

This will get the value of one of a record's members.

Argument 1: The record's identifier (variable of record or reference type)
Argument 2: The member's identifier
Result: Where to store that member's value or the reference to the member

The type of this operation's result is determined by {TYPE}, which must either match the type of the member that is accessed by this operation or be "reference":

Primitive type: Normal value
Record/Array type: The member's entire allocated memory will be copied and stored into Result
Reference: Result will be set to a reference to the member

ARRAYGET{TYPE}

This will get the value of one of an array's elements:

Argument 1: The array's identifier (variable of array or reference type)
Argument 2: The element's index
Result: Where to store that element's value or the reference to the element

The type of this operation's result is determined by {TYPE}, which must either match the type of the element that is accessed by this operation or be "reference":

Primitive type: Normal value
Record/Array type: The element's entire allocated memory will be copied and stored into Result
Reference: Result will be set to a reference to the element

RECORDSET{TYPE}

This will set the value of a record's member:

Argument 1: The record's identifier (variable of record or reference type)
Argument 2: The member's identifier
Result: Not a result, used to store the value to set the member to (identifier / primitive value)

The types of the member and the value to set it to must match {TYPE}.

ARRAYSET{TYPE}

This will set the value of an array's element

Argument 1: The array's identifier (variable of array or reference type)
Argument 2: The element's index
Result: Not a result, used to store the value to set the element to (identifier / primitive value)

The types of the element and the value to set it to must match {TYPE}.

Examples

record { num foo; real bar; } A;

Operator	Argument 1	Argument 2	Result
DECLARE	real		"bar"
DECLARE	num		"foo"
DECLARE	record	2	"A"

real[4][2] B;

Operator	Argument 1	Argument 2	Result
DECLARE	real
DECLARE	array	4
DECLARE	array	2	"B"

num a; a = A.foo;

Operator	Argument 1	Argument 2	Result
DECLARE	num		"a"
RECORD_GET_NUM	"A"	"foo"	"a"

real b1; b1 = B[1][3];

Operator	Argument 1	Argument 2	Result
DECLARE	real		"b1"
DECLARE	reference		"t1"
ARRAY_GET_REFERENCE	"B"	1	"t1"
ARRAY_GET_REAL	"t1"	3	"b1"

real[4] b2; b2 = B[1];

Operator	Argument 1	Argument 2	Result
DECLARE	real
DECLARE	array	4	"b2"
ARRAY_GET_REAL	"B"	1	"b2"

real x; x = 5.0; B[1][3] = x;

Operator	Argument 1	Argument 2	Result
DECLARE	real	5	"x"
DECLARE	reference		"t1"
ARRAY_GET_REFERENCE	"B"	1	"t1"
ARRAY_SET_REAL	"t1"	3	"x"

real[4] x; x[0] = 3.0; x[1] = 2.0; x[2] = 1.0; x[3] = 0.0; B[1] = x;

Operator	Argument 1	Argument 2	Result
DECLARE	real
DECLARE	array	4	"x"
ARRAY_SET_REAL	"x"	0	3.0
ARRAY_SET_REAL	"x"	1	2.0
ARRAY_SET_REAL	"x"	2	1.0
ARRAY_SET_REAL	"x"	3	0.0
ARRAY_SET_ARRAY	"B"	1	"x"

real x = 0.0; A.bar = x;

Operator	Argument 1	Argument 2	Result
DECLARE	real	0.0	"x"
RECORD_SET_REAL	"A"	"bar"	"x"

record { num foo; record { real x; real y; real z; }[3][10] triangles; real bar; } C;

Operator	Argument 1	Argument 2	Result
DECLARE	real		"bar"
DECLARE	real		"z"
DECLARE	real		"y"
DECLARE	real		"x"
DECLARE	record	3
DECLARE	array	3
DECLARE	array	10	"triangles"
DECLARE	num		"foo"
DECLARE	record	3	"C"

record { real x; real y; real z; } point; point.x = 0.0; point.y = 0.0; point.z = 0.0; C.triangles[4][1] = point;

Operator	Argument 1	Argument 2	Result
DECLARE	real		"z"
DECLARE	real		"y"
DECLARE	real		"x"
DECLARE	record	3	"point"
RECORD_SET_REAL	"point"	"x"	0.0
RECORD_SET_REAL	"point"	"y"	0.0
RECORD_SET_REAL	"point"	"z"	0.0
DECLARE	reference		"t1"
RECORD_GET_REFERENCE	"C"	"triangles"	"t1"
DECLARE	reference		"t2"
ARRAY_GET_REFERENCE	"t1"	4	"t2"
ARRAY_SET_RECORD	"t2"	1	"point"

marcowallischprinz commented 11 years ago

Hi guys, actually I'm in favor of the typed version as well, since it would facilitate codegen indeed. Nevertheless, as Moritz mentioned, the compiler will be used in lecture next time and imho TAC normally doesn't consist of operations like "declare", as additional information like operands' types is provided by the symbol-tables and doesn't have to be stated in the TAC again. Even though it might be easier for us, I think we should stick to the way being described in the dragonbook and thus hopefully improve the students' understanding during next lecture. Imho this doesn't contradict the modularization requirements. I think, it makes no odds, whether we provide the backend with an unusual TAC or an usual TAC along with a symbol-table. However, if the majority is in favor of doing it that away, I'll be pleased as well, as it probably will make my job easier. ;) BR

marcowallischprinz commented 11 years ago

If you are going to implement type-checking methods, imho some type conversion operators will be interesting, especially in the typed version. BR

ghost commented 11 years ago

If you are going to implement type-checking methods, imho some type conversion operators will be interesting, especially in the typed version.

I'm not sure what you're referring to when you mention type-checks in relation to the TAC, as the TAC the frontend generates should not contain any type errors, but type conversion operators between primitive types (I can't think of any sensible ones between the derived types the source specifies) will be necessary regardless of whether we represent derived types in the TAC or not, I agree.

fub-frank commented 11 years ago

I am still in favour of typed TAC. Advantages were given in this discussion already. The updated proposal looks promising as it is easier to use than proposal 1 and needs 2 operators less.

If you are going to implement type-checking methods, imho some type conversion operators will be interesting, especially in the typed version.

TAC is correct by definition. Tac should not contain any errors. Please see issue #1 on this.

but type conversion operators between primitive type will be necessary

Currently type conversion can occur between type num and type real (regarding TAC for Milestone 1 only). For a discussion on these types, please see issue #8.

marcowallischprinz commented 11 years ago

I'm not sure what you're referring to when you mention type-checks in relation to the TAC

I thought, that type-checking is part of the intermediate code generator and of course it should be done before generating the TAC, but as TAC is correct by definition, the question is answered implicitly... Sorry, maybe was a dumb question...

jvf commented 11 years ago

(this comment refers to the now unified proposal by the fuc backend group, which can be found in the wiki. I myself am part of the fuc backend group)

Two remarks: 1) Concerning the java interface Quadruple(): I think it is bade style to return null as a (intended) value of a function, therefore i would suggest we use a special case object instead, which here could be a String "NOTUSED" for instance. 2) I added the default values in the remarks of the `DECLARE{TYPE}` statements. I know it is redundant, but redundancy can be good for communication ;-)

ghost commented 11 years ago

1) Concerning the java interface Quadruple(): I think it is bade style to return null as a (intended) value of a function, therefore i would suggest we use a special case object instead, which here could be a String "NOT_USED" for instance.

I don't agree with it being bad style as no functions actually returns null, only the Quadruples may contain null as one of their fiels, but I hae no problem with there being an extra string value instead of null. Hack away^^

2) I added the default values in the remarks of the DECLARE_{TYPE} statements. I know it is redundant, but redundancy can be good for communication ;-)

Great, thanks. If you find more there that can be clearer please polish it up^^

jvf commented 11 years ago

I don't agree with it being bad style as no functions actually returns null, only the Quadruples may contain null as one of their fiels, but I hae no problem with there being an extra string value instead of null. Hack away^^

I would say it should be either `undefined' (if we are not going to check it) or a special case value, if we decide to check for well-formedness of quadruples for instance in tests.

ghost commented 11 years ago

I would say it should be either `undefined' (if we are not going to check it) or a special case value, if we decide to check for well-formedness of quadruples for instance in tests.

The value is now specified in the backend interface as Quadruple.EmptyArgument, which is a constant evaluating to "!".

ghost commented 11 years ago

I've updated the wiki with proposed additions for the features that are a requirement of milestone 2: https://github.com/swp-uebersetzerbau-ss13/common/wiki/Three-Address-Code-Specification

PS: A proposal for arrays/structs will follow.

UPDATE: Arrays have been added, struct-proposal postponed until Milestone 3. If there are no problems with this, the interface files will be updated this sunday (at which point the "proposed" mark will be also be removed from the wiki entries).

ghost commented 11 years ago

The interface files have now been updated and the wiki "proposed" marks have been removed.

ghost commented 11 years ago

The specification has been updated for milestone 3; comments, etc. should go here.

Changelog:

+ declaration and getter/setter operators for the struct type
+ string concatenation via the STRING_CONCAT operator
+ conversion operators from boolean/long/double to string ({BOOLEAN/LONG/DOUBLE}_TO_STRING)
+ some complex examples for arrays and structs
- ARRAY_GET_ARRAY operator
- ARRAY_SET_ARRAY operator
- all print operators other than PRINT_STRING, use conversion to string operators

ghost commented 11 years ago

As no one seems to have a problem with, or suggestions for the milestone 3 TAC specification, it has now been merged (from tac_ms3 branch) into master.

swp-uebersetzerbau-ss13 / common