shinh / elvm

EsoLangVM Compiler Infrastructure
MIT License
1.12k stars 143 forks source link

EOF is not working as intended #96

Open umutoztunc opened 3 years ago

umutoztunc commented 3 years ago

While I was trying to investigate some issues with fgets implementation, I've noticed that the source of the issue is EOF definition itself.

int c = getchar();
if (c < 0) {
  printf("%x\n", c);
  printf("%x\n", EOF);
}

After running the code snippet above, I have got the following result:

-1
ffffff

As you can see, EOF is 24-bit instead of 32-bit. Therefore, if (c == EOF) is always false. This can be seen in the generated EIR code as well: image 16777215 is simply 0xffffff. This causes some bugs. Even though interpreting the EIR output with eli works fine for some cases, it still fails if you try to convert EIR to Whitespace for instance.

Since I haven't been able to pinpoint issue yet, I have made temporary fix by changing the condition from c == EOF to c < 0. I will make a pull request soon.

By the way, printf might have issues with hexadecimal representation as well since it literally printed back -1 instead of ffffffff.

shinh commented 3 years ago

ELVM has a stupid ABI. All integer types are unsigned and have the same size (i.e., sizeof(char) == sizeof(int) == 1). 1 "byte" is 24 bits in most backends (everything is originally designed to run brainfuck compiled from C fast). So in ELVM, EOF is 0xffffff, I agree it sounds very strange, though. There is even a test: https://github.com/shinh/elvm/blob/master/test/eof.c

Since there is no negative number, if (c < 0) is not a right condition and it should be if (c == EOF). If it does not work in some backends, it's a bug of the backend translator.

#include <stdio.h>

int main() {
    int c = getchar();
    printf("%x\n", c);
    printf("%x\n", EOF);
}

showed

ffffff
ffffff

by eli, elc -c, and elc -rb. If -1 is somehow shown, the whitespace backend has an issue? For some reason, whitespace is not working in my environment right now, so haven't confirmed it yet.

umutoztunc commented 3 years ago

I see. I have confirmed that the issue is related to whitespace backend. If I run the code with eli, it prints ffffff, but if I run it with the whitespace interpreter, it prints -1 instead.

By the way, is there a reason fgets return type is int instead of char*? Also fgets doesn't use the given fp variable, it acts like gets with a character limit. I am aware that most backends don't support file i/o operations. Still, can't we call fgetc(fp) instead of getchar() without breaking anything?

shinh commented 3 years ago

Wow, I haven't realized fgets is returning int! It's definitely a bug. It looks like there is no user of this function and that's why this issue wasn't found. I also agree fgets should call fgetc for the EOF hack for whitespace in fgetc.

I think https://github.com/shinh/elvm/pull/97 fixes the both of the issues? Thanks for pointing this out!

umutoztunc commented 3 years ago

You are so fast! Yes, this fixes both. Actually, it fixes another bug that I haven't mentioned yet :D. The previous version was copying EOF to buffer, just like \n. Also, it seems we both fixed it almost the same way. My local patch was like this: image

The only difference in our approaches, your version does nothing and return null pointer if size is 1. Mine puts a null byte at the beginning of the buffer and returns the buffer. I have referred to this implementation to decide that.

shinh commented 3 years ago

Ah, yours is better! Could you send a PR to re-correct the implementation? If you don't have time for it, I'll do it this night.

umutoztunc commented 3 years ago

Sure! By the way, there is one more bug. All i/o functions stop when they encounter a null byte in ELVM's implementation. I don't know the reason, but it seems getc, getchar, fgetc are all returning EOF instead of 0 for null byte. Since we use them to implement fgets, it also cannot read an input such as "foo\x00bar". Maybe, the bug is related to 8cc.

On a side note, using fgetc in fgets does not fix the issue with whitespace backend. So, there is still something wrong about it.

shinh commented 3 years ago

Yeah, it's unfortunate we cannot distinguish an EOF and a null byte and it's impossible to handle binary files without preprocessing.

If it's a bug rather than a limitation, it's due to the definition of ELVM's GETC instruction. When it reaches to EOF, ELVM implementation should fill 0 instead of -1: https://github.com/shinh/elvm/blob/c30d33fe2c3ffa182f3e502b365cbea9267ffed8/ir/eli.c#L169

I think I chose this definition because some backend esolangs do not distinguish EOF from 0 anyway. For example, many brainfuck implementation fills -1 or 0 when , reaches to EOF: https://esolangs.org/wiki/Brainfuck#EOF I think current BF backend supports both -1 and 0 (https://github.com/shinh/elvm/blob/c30d33fe2c3ffa182f3e502b365cbea9267ffed8/target/bf.c#L529), but this means programs compiled to BF cannot handle 0 and 255 in binary files.

Some backends have more limitations in their inputs. Please check https://github.com/shinh/elvm/blob/master/tools/runtex.sh for example. It converts /dev/stdin to another format.

umutoztunc commented 3 years ago

Oh I see. Maybe we can choose some defaults and provide other stuff as options while converting to backends. For example, we can set EOF as -1. However, the user should be able to provide a flag to change it if he wants such as: out/elc -bf -eof=0 infile.eir > outfile.bf

Still, it might make things too complicated. I am not sure if that's worth it, just an idea.

shinh commented 3 years ago

I think fgets was still wrong. I didn't expect fgets has such a corner case :) We should have returned nullptr if nothing is read while size > 1: https://github.com/shinh/elvm/pull/99

As for -eof=0, it makes sense, but I'd rather revisit the definition of GETC instruction if I have time.

umutoztunc commented 3 years ago

This looks good to me.

Should I create another issue for the Whitespace backend? So, you said that all numbers should be 24-bit and unsigned. Doesn't this contradict with Whitespace language itself? From my understanding, the Whitespace language should be able to use negative numbers and there is no bit limit for the numbers. So, I should be able to work with a 64-bit number.

shinh commented 3 years ago

The EIR to WS translator should have no issue as it emits code which limits integers to 24bit range: https://github.com/shinh/elvm/blob/18b45bc85e27a862a9367087ccbab27631a0ee96/target/ws.c#L145 I agree it does not fully utilize capability of whitespace language, but the translation should be valid.

It's fine to define a 64bit or 32bit dialect of ELVM IR and support a mode like elc -ws -64bit. As for whitespace, I guess removing (x+(1<<24))%(1<<24) I mentioned and setting 1<<24 to SP to keep 24 bit address space would work? I think the initial value of SP is set here: https://github.com/shinh/elvm/blob/18b45bc85e27a862a9367087ccbab27631a0ee96/target/ws.c#L232

umutoztunc commented 3 years ago

I have just tested it again with our latest patch, it already works as intended. Both whitespace interpreter and eli shows that the buffer is filled with zeros instead of -1s. I guess switching from getchar to fgetc does solved the issue since there is a whitespace special trick in fgetc implementation.

Also, you are right that the translation is still valid even though numbers are 24-bit. I guess everything is working now? :D

umutoztunc commented 3 years ago

Oh sorry, the bug is still there. I forgot that it works fine if you provide \n in the input. I will try to find a solution. I just don't understand how we get -1 if all numbers are unsigned.

umutoztunc commented 3 years ago

By the way, tests for Whitespace backend crashes. This might be related to the issue.

➜ make test-ws
Skip building kx due to lack of kinx
Skip building wasi due to lack of wasmtime
Skip building wasm due to lack of wat2wasm
Skip building el due to lack of emacs
Skip building cl due to lack of sbcl
Skip building scala due to lack of scalac
Skip building swift due to lack of swiftc
Skip building cr due to lack of crystal
Skip building cs due to lack of
Skip building i due to lack of ick
Skip building forth due to lack of gforth
Skip building fs due to lack of
Skip building lua due to lack of lua
Skip building ll due to lack of lli
Skip building scm_sr due to lack of gosh
Skip building hs due to lack of ghc
Skip building oct due to lack of octave
Skip building scratch3 due to lack of
Skip building j due to lack of jconsole
./runtest.sh out/elc.c.eir.ws.out tools/runws.sh out/elc.c.eir.ws
Buffer overflow!
make: *** [build.mk:5: out/elc.c.eir.ws.out] Error 1
shinh commented 3 years ago

As for the "Buffer overflow!", I believe I found the cause and fixed it by: https://github.com/shinh/elvm/commit/8d7508322ce71f9e95fa2ba84fbe983f214b9f07 You probably need to re-build whitespace interpreter by touch Whitespace/whitespace.c && make test-ws

As for -1, perhaps https://github.com/shinh/elvm/pull/100 may fix the issue?

umutoztunc commented 3 years ago

Cool! Both are fixed now. There is no overflow and -1s are clearly gone. Great job mate, thanks!