plum-umd / the-838e-compiler

Compiler for CMSC 838E
2 stars 0 forks source link

Locally working draft of adding lambda to the language #62

Closed dandorat closed 3 years ago

dandorat commented 3 years ago

With the help of the class notes and the codes from the Fall 2019 class, implemented higher order functions. There are some design preferences that we will need to discuss and change if needed. In addition, will need to add implementation of lambda for apply. Will discuss more tomorrow. Thank you!

dandorat commented 3 years ago

The initial program in the form of (begin defines e) is parsed to struct (Prog ds e) and then this is transformed to a struct (Letrec xs es e), where xs is a list of function ids and es is a list of structs (Lam xs e) or (Lam* xs xs* e). Then, lambdas in the program are labelled and the transformed program is compiled. The list of all the lambdas in the program is also formed and lambda definitions for these are compiled. Compilation of letrec and its right-hand-side λs is done with compile-letrec, compile-letrec-λs, and compile-letrec-init, and compilation of other lambdas is done with compile-λ. Compilation of function calls (e es) where e is an expression that evaluates to the address of a lambda on the heap, is done with new functions compile-call, compile-tail-call, and compile-nontail-call.

I did not change the current compilation of the definitions of the library functions by adding something like (define (stdlibs p) (Letrec fs ls p) and then (compile (stdlibs p)), where fs is the list of the ids of all the functions in the libraries (stdlib-ids) and ls are the corresponding lambdas. The library function definitions are still compiled with compile-defines, and functions calls (e es) where e is a member of the stdlib-ids list are still compiled with compile-app, compile-tail-app, and compile-nontail-app.

dandorat commented 3 years ago

I did two additional commits for higher order functions. In commit 4, I added compile-tail-applyL and compile-nontail-applyL to compile (apply e0 e1), where e0 evaluates to the pointer to a λ closure, in tail and nontail positions. Also changed the code such that the parser does the transformation of the program and labelling of the λ's. Also added more tests and fixed a bug in compile-λ-definition for variable arity lambdas to compile (let ((z 7)) ((λ x z))) or ((λ x x)) correctly. Also, updated the interpreter. In this commit, the library functions are still compiled via compile-define and the compiled libraries are linked with the other objs to make runtime.o.

In commit 5, for each of the library functions, the definition is desugared to an id and its corresponding λ, and the list of the ids for all of the library functions (fs) and the list of the corresponding λ's (ls) are compiled with a letrec with the program p as the body of the letrec, that is, (Letrec fs ls p) is compiled. Updated compile.rkt, parse.rkt, externs-stdlib.rkt, externs.rkt, interp.rkt accordingly. Also added finding λ's in and desugaring match clauses. Also, added some more tests. Other than some small changes, such as adding the registers used in the functions to the notes at the top of compile.rkt, removing the commented out codes, and changing proc-mask or using ptr-mask instead, I think this is ready for merging,

dandorat commented 3 years ago

The lambda2 branch I was working on had diverged from the main branch including the merged changes from the modules branch. Since the functions in the modules were compiled via compile-define, this caused problems with the commit 5 of lambda2 branch where all the functions, including the library functions, are higher-order. I changed the files so that the functions in modules are compiled as higher-order functions also. The tests work well and I think this is ready for merging after review.

dandorat commented 3 years ago

Changed the files so that the a86 code for libraries' functions' compile-letrec-λs and compile-letrec-init, and the x86 code for their compile-λ-definitions are compiled just once and are not recompiled for each program. The overall time for the tests (which compared with the main branch include additional tests for higher-order functions and letrec) improved from 9m 38s to 4m 41s. Made the following changes:

  1. In a86/ast.rkt: the transparent structs for the a86 assembly instructions changed to serializable transparent structs.

  2. Added the file compile-lib-letrec-lmdefs.rkt to generate and serialize the compiled letrec-λs and letrec-init a86 code for libraries' functions and save it to a file called libraries-letrec, and to emit the x86 code for the compiled a86 code for libraries' functions' λ-definitions and save it to a file called libraries-lmdefs.s .

  3. In compile.rkt: The initial letrec-λs and letrec-init code for libraries' functions' is read and included from the saved file libraries-letrec. The function compile-library-letrec added for generating the a86 code for this, and also saving other needed info in the files lib-fs, lib-ls-ids, and lib-externs. The function compile-library-λdefs added for compiling the a86 code for libraries' functions' λ-definitions, which is saved as x86 code in libraries-lmdef.s by compile-lib-letrec-lmdefs.rkt in #2.

  4. In Makefile, added linking of libraries-lmdef.o to $(objs) in the creation of runtime.o. Added the dependencies for this, which will result in the execution of compile-lib-letrec-lmdefs.rkt to make libraries-lmdef.s . This will also result in the creation of the other files mentioned above.

  5. In parse.rkt, added the function desugar-def-lib to make the library functions' lambda labels specific by using the symbol 'lib-lambda- for these labels when generating them by gensym. Made associated changes in externs.rkt, externs-stdlib.rkt, and compile.rkt.

I also worked on an additional solution for more optimization by storing the x86 assembly code of the compiled letrec-λs and letrec-init of library functions in a file and piecing together this with the x86 assembly code of the program by using port->string in compile-file.rkt. This way no repeat of translation of a86 to x86 assembly for this piece of code is needed and also no serialization is needed. This code works for compiling programs with compile-file.rkt. But it causes problems with asm-interp, which necessitates major changes in the asm-interp and asm-interp-io functions in a86/interp.rkt.

I think this is ready for merging after review.