Undefined behaviour in C target

larsbrinkhoff / lbForth

Self-hosting metacompiled Forth, bootstrapping from a few lines of C; targets Linux, Windows, ARM, RISC-V, 68000, PDP-11, asm.js.

GNU General Public License v3.0

418 stars 112 forks source link

Undefined behaviour in C target #26

Open larsbrinkhoff opened 7 years ago

larsbrinkhoff commented 7 years ago

@pipcet reported a case of undefined behaviour in the C target. This may affect SP@ and SP!.

larsbrinkhoff commented 7 years ago

There is no way in lbForth to push the value of the return stack pointer to the return stack, so RP@ and RP! should not have the same problem.

pipcet commented 7 years ago

Well, the good thing is that things blow up right away if the "wrong" definition is chosen by the compiler—dup doesn't work (or at least it didn't for me).

If I'm reading the code correctly, rp@ is equivalent to RP @, so it would be nice and symmetric if sp@ were equivalent to SP @. However, I know very little about Forth and I don't know whether it's standardized the other way.

Of course the geek solution is to produce "push %esp" assembly code, which many CPUs (used to) get "wrong" ;)

Note that

?: sp@   SP @ cell + ;
?: rp@   RP @ cell + ;

looks symmetric, but really isn't: in the rp@ case, we're adjusting the stack for the exit implied by ;, but in the sp case, there's no real reason for the adjustment.

larsbrinkhoff commented 7 years ago

SP@ and RP@ aren't standardized. But they do exist in many Forths, so there are expectations on how they should work.

My x86 target does use push %esp. (Also 68000, ARM, PDP-11.) I haven't seen any problem, but then I haven't ran the cold on any old CPU. Which get it wrong?

Good insight about the difference between the SP and RP adjustment. It's expected that the address returned by SP@ is the stack pointer before the call. Inside the definition, pushing SP increases stack depth. Then the stack pointer is fetched by @. So that value must be adjusted.

pipcet commented 7 years ago

Apparently it was longer ago than I thought—8086s apparently used the other result for push %sp. I thought some non-Intel CPUs competing with the 80486 got it wrong.

My mistake was to think of fetching the stack pointer with @ as a three-step process: pop the TOS, dereference it, push the new TOS. The first step decreases stack depth again, so no adjustment would be necessary in my version. It sounds like you're thinking of @ as an atomic operation that doesn't modify the stack pointer?

(OT: it's quite a hack, but I finally got the asm.js version to make it to the "ok" prompt. It's not as fast as I'd hoped, but about twice as fast as basic JavaScript in the SpiderMonkey shell.)

larsbrinkhoff commented 7 years ago

Yes, I was thinking as @ as an atomic operations. My assembler versions don't touch the stack pointer. But now that I look more closely, my C @ does update the stack pointer.

I started a JavaScript target some time ago, but I ran into a dead end. It's still in the js-disaster branch. Feel free to submit a pull request if you like!

larsbrinkhoff commented 7 years ago

I made a tentative fix in the fix-undef branch.

pipcet commented 7 years ago

That fix makes it obvious that side effects happen before the deref, so that works for me.

I still have a very slight preference for : sp@ SP @ ;, as in https://github.com/pipcet/lbForth/compare/master...pipcet:change-sp-fetch?expand=1, but that would require all ports to change, and it'd make the code uglier for ports that don't touch the stack pointer...

larsbrinkhoff commented 7 years ago

I agree that : sp@ sp @ ; looks cute, but I'd like to stick with the Forth philosophy: keep it simple. Exactly what is more simple may be a matter of taste, of course.

larsbrinkhoff commented 7 years ago

I see now there are more cases of undefined behaviour. There are warnings that dereferencing type-punned pointers will break aliasing rules. I guess I have been too liberal casting pointers.