plum-umd / the-838e-compiler

Compiler for CMSC 838E
2 stars 0 forks source link

Input ports (#19) #40

Closed john-h-kastner closed 3 years ago

john-h-kastner commented 3 years ago

This should implement input ports. A few TODOs left that should probably be done.

dvanhorn commented 3 years ago

Are you still working on the TODOs? Or should I merge?

dvanhorn commented 3 years ago

Although this is passing the CI tests, it fails locally for me on macos. I'm getting 'err on the first run-with-file test, which means the fopen in open_input_file is returning NULL. I'm not sure why. If I run the Villain REPL and try to open a file in the current directory, things work, e.g.

☞  racket repl.rkt
Welcome to Villain v0.0.
$> (port? (open-input-file "gc.c"))
#t
$> (port? (open-input-file "test/compile.rkt"))
#t

But if I try to open a temporary file, I get 'err:

☞  racket repl.rkt
Welcome to Villain v0.0.
$> (port? (open-input-file "/var/folders/ml/jqmf1yqj5n9f721tqmkz8tm40000gn/T/input16140085201614008520208.txt"))
err

This file exists:

☞  ls -alt /var/folders/ml/jqmf1yqj5n9f721tqmkz8tm40000gn/T/input16140085201614008520208.txt
-rw-r--r--  1 dvanhorn  staff  6 Feb 22 10:42 /var/folders/ml/jqmf1yqj5n9f721tqmkz8tm40000gn/T/input16140085201614008520208.txt

I wonder if this is maybe about an insufficiently large buffer for the file name or something?

john-h-kastner commented 3 years ago

Try out this file. Compile with the makefile instead of running through the repl. I see the correct output at the repl, but the string is not printed correctly when I build and execute the .run.

#lang racket
"/tmp/tmp.WiR2xJytmr"

It looks like there's a bug in the utf8 encoding function. It thinks some of the character are 4-byte codepoints, even though they're all ascii.

dvanhorn commented 3 years ago

Hmm, yeah I see the garbled output when running that program.

john-h-kastner commented 3 years ago

I think I've got a fix. Will push later today.

john-h-kastner commented 3 years ago

The problem was in unpacking characters from strings. The first two characters in each int64_t had high bits sets because of the of other characters in the same word. These bits need to masked out so that only the low 21 bits are left when printing the codepoint.