plum-umd / the-838e-compiler

Compiler for CMSC 838E
2 stars 0 forks source link

Modules branch #59

Closed dandorat closed 3 years ago

dandorat commented 3 years ago

Comments on previous pull request are also included bellow.

The reason for problem with the tests when putting all five exmod*.rkt files in the test folder was related to that two of those files were modules that have an expression at the end of them, and three were modules that have provide, require, and definitions, but no expression; for example, exmod2.rkt is with no expressions at the end as follows:

#lang racket
(begin
  (provide h1)
  (require "exmod3.rkt" "exmod4.rkt")
  (define (h1 x) 2))

make exmod2.s worked, but compiling the executable exmod2.run and running it did not work because the module returned no value. So now it returns void and the tests can now be done on these modules also.

The reason that the message "cat: modulefiles: No such file or directory" printed for all files was that when compile-file -m %.rkt > %.s is done, modules.rkt runs at this step and creates the file modulefiles with the list of the .o files for the required modules. Then, the list in this file is used in make by $(shell cat modulefiles). At that time the file is available and this does not cause any problem with the compilation. Make initially forms a dependency graph of targets and prerequisites and because $(shell cat modulesfiles) was in the prerequisites of a target, this message was produced while there is no real error.

Makefile and formdps.c was modified such that $(shell cat modulefiles) is not in make prerequisites and the message "cat: modulefiles: No such file or directory" is not generated by Makefile now.

In addition, I found out that when a module is required in a file in racket, racket runs the module, and if the module has an expression at the end, it evaluates the expression and outputs the result. Currently this feature is not supported in our module system. So, I removed the expression at the end of exmod1.rkt which is one of the modules required by exmod0.rkt:

exmod0.rkt:

#lang racket
(begin
  (require "exmod1.rkt" "exmod2.rkt" "exmod3.rkt"
           "exmod4.rkt" "exmod5.rkt" "exmod6.rkt")
  (define (f x) x)
  (+ (+ ( + (+ (h4 5) (g2 3)) (h5 5)) (h6 5))
     (+ (g1 5) (+ (h1 5) (+ (h2 5) (h3 5))))))

exmod1.rkt:

#lang racket
(begin
  (provide g1)
  (require "exmod2.rkt" "exmod3.rkt")
  (define (g1 x) 1))
 ; (h1 9))

In addition, I did updates in parse.rkt and modules.rkt such that modules with (provide (all-defined-out)), and modules with just provide and no require, can also be compiled. Added exmod5.rkt and exmod6.rkt to test these features:

exmod5.rkt

 #lang racket
(begin
  (provide g2)
  (define (g2 x) x))

exmod6.rkt

#lang racket
(begin
  (provide (all-defined-out))
  (define (h4 x) 3)
  (define (h5 x) 3)
  (define (h6 x) 3))

Previous comments:

Modules

Implemented modules by adding the following:

  1. a modules.rkt file
  2. a C file (formdps.c)
  3. changes in Makefile to call formdps.c which then via a system call invokes make again (to allow the module dependencies to be calculated in modules.rkt to produce a list of .o files of the needed modules and also to compile the .s files of the modules in this step before the rest of the recipe in the Makefile is carried out)
  4. changes in ast.rkt, parse.rkt, compile-file.rkt, externs.rkt, and compile.rkt.

modules.rkt calculates and stores the directed graph of module dependencies. If there is a cycle in this graph, modules.rkt detects this and produces an error.

Currently, the following formats for the modules are supported:

  1. (begin (provide "filename.rkt" ...) (require "filename.rkt" ...) defines)
  2. (begin (require "filename.rkt" ...) defines)
  3. (begin (provide "filename.rkt" ...) (require "filename.rkt" ...) defines e)
  4. (begin (require "filename.rkt" ...) defines e)

Example modules exmod0.rkt, exmod1.rkt, exmod2.rkt, exmod3.rkt, and exmod4.rkt added.

Exmod0.rkt has the format (begin (require "filename.rkt" ...) defines e) and in this example it is used as the root file to be compiled and for the expression e to be evaluated.

If a module is the root file, the expression e is included in the compilation. But if a module is not the root file and it has formats 3 and 4, during the compilation the information on imports and exports is included in the compilation, and the defines are compiled, but the expression e is not compiled.

Example Run

Executing make exmod0.run will first run the command racket -t compile-file.rkt -m exmod0.rkt > exmod0.s. During this, compile-file.rkt calls modules.rkt, which calculates module dependencies, does the compilation of the .s files for the other modules so that compile-file.rkt is not called for them later by the Makefile, and writes the names of the .o files for the modules that need to be created in a file called modulefiles.

Then, the following are done:

  1. nasm -f $(format) -o exmod0.o exmod0.s to form exmod0.o
  2. ./formdps.c make exmod0.run to make a system call: make exmod0.run2
  3. Then by the recipe for %.run2 in the Makefile, the following files are created: the .o files for the rest of the modules (based on the list in modulefiles), runtime.o, and the executable exmod0.run2
  4. ./formdps.c mv exmod0.run2 to rename exmod0.run2 to exmod0.run.

Then, running the executable ./exmod0.run will produce the correct result 10.

Optionally, a file called modulesgraph can be created by modules.rkt for information about the directed graph of module dependencies in adjacency list format for the last compilation.

Regarding formdps.c

In the original Makefile, we have this rule:

%.run: %.o runtime.o gcc runtime.o $< -o $@ $(libs) -lm

I want the rule for making %.o be done before the rule for making runtime.o is done, so that the command: racket -t compile-file.rkt -m $< > $@ for making rootfile.s is executed first.

This will ensure that modules.rkt is executed calculating the imports and exports, making the .s files for the required modules incorporating this, and listing the .o files needed for these required modules in the temporary file modulefiles.

Then, the rule for making runtime.o should be done which will start with making these listed .o files and then links them to the rest of .o files to make runtime.o.

As far as I checked, there is no facility in Make for enforcing that the rule for making %.o is done before runtime.o. Hence, the use of formdps.c to ensure this sequence. If there is a way to do that in Make, then we can skip formdps.c.

Regarding modulesgraph

I should have explained that this file is not involved in the compilation and I added it just for informational purposes to give infomation about the graph of modules dependency for the last compilation. As such, it can be removed.

The calculations for each compilation are kept in a list called mgraph during the execution of modules.rkt for each compilation. As long as there is no concurrent or parallel compilation and compilation of the two programs are done in sequence, there should not be any issues, because the list mgraph is not kept after each run of modules.rkt.

Modifications for compilation of two programs

In order for the .o files created in a previous compilation and the .s files not to interfere with a subsequent compilation, I modified the following line in the Makefile: rm -f formdps modulefiles to also remove runtime.o $(shell cat modulefiles) *.s

I modified exmod1.rkt to include an expression (h1 9):

#lang racket
(begin
  (provide g1)
  (require "exmod2.rkt" "exmod3.rkt")
  (define (g1 x) 1)
  (h1 9))

Then, compiled exmod0.run and then exmod1.run. This worked well. Then, removed exmod0.run and compiled exmod0.run again, which worked well. The following sequence also worked well: compilation of exmod1.run, compilation of exmod0.run, removal of exmod1.run, and then compilation of exmod1.run again.

Some more information about modules.rkt

The modules.rktcalculates the directed graph of the module dependencies and keeps the graph in a list (mgraph) during execution. Each element of mgraph is a pair of an Mnode struct for that module and a list of file names representing the adacency list of the modules required by the module in that node in the modules graph. The Mnode struct for each module keeps the information about the functions that the module provides, the definitions and the expression.

When we want to compile a root module by the command make <rootmodule>.run, compile-file.rkt is called for that root module. Then compile-file.rkt calls modules.rkt. Then modules.rkt calculates the mgraph and then, by the information in the mgraph, compiles the other modules. For each module, the information on the functions provided by the modules required by that module and the functions that the module provides are incoporated in the compilation. Finally, the control goes back to compile-file.rkt returning also the information on the functions provided by the modules required by the root module and then the root module is compiled incorporating this information also.

dvanhorn commented 3 years ago

Why is formdps.c written in C?

dandorat commented 3 years ago

I needed to pass arguments from Makefile to a programming language and then generate some commands with the help of that language and then make system calls to shell with those commands. C looked like a good choice and I knew how to do this in C also.

dvanhorn commented 3 years ago

Here's a sketch of how to do this in Racket (formdps.rkt):

#lang racket
(provide main)
(define (main arg1 arg2)
  (printf "arg1: ~a\n" arg1)
  (printf "arg2: ~a\n" arg2)
  (system "ls"))

In the Makefile:

%.run:  %.o
        racket -t formdps.rkt -m make $@
dandorat commented 3 years ago

Yes, it is similar to C. But I also use fopen, fgetc, rewind, and putc in the C code. I need to look up the corresponding functions in Racket and the behavior of those functions. Will do that now.

dvanhorn commented 3 years ago

I think the right approach is to avoid these things entirely. I will try to push a sketch of how to accomplish this later today.

dandorat commented 3 years ago

Ok, I'll think about it also.

dandorat commented 3 years ago

Following your sketch above, I wrote formdps.rkt as follows and replaced the C file with it. The tests work well.

#lang racket
(provide main)
(define (main arg1 arg2)
  (if (equal? arg1 "mv") 
      (let ((str_run (substring arg2 0 (- (string-length arg2) 1))))
        (begin (system (string-append "mv " arg2 " " str_run)) (void)))
      (begin (system "touch modulefiles")
             (let ((in (open-input-file "modulefiles")))
               (begin
                 (let ((str (read-line in)))
                   (if (eof-object? str)
                       (void)
                       (if (equal? (string-ref str 0) #\y)
                           (begin (string-set! str 0 #\ )
                                  (system (string-append "make " str)))
                           (void))))
                 (close-input-port in)))
             (let ((out (open-output-file "modulefiles" #:exists 'update)))
               (begin
                 (file-position out 0)
                 (display " " out)
                 (close-output-port out)))
             (begin (system (string-append arg1 " " arg2 "2")) (void)))))