rrthomas / mit

A simple stack-based VM
Other
10 stars 3 forks source link

`ret`: simplify the spec #403

Closed rrthomas closed 4 years ago

rrthomas commented 4 years ago

@apt1002 observes that it's possible to simplify as follows: remove ret's ability to return from a catch: the only way to return from a catch is then throw. Instead, when it is desired to catch an arbitrary subroutine, proceed as follows: catch a routine that does: call; pushi_0; throw.

rrthomas commented 4 years ago

This won't work because it prevents values being returned from catch.

rrthomas commented 4 years ago

The aim of this change was to simplify the design (by separating ret from throw). Also, @apt1002 observed that it would optimize Bee, which currently to obtain similar semantics must mark catch return addresses (by setting the low order bit to 1) so that ret can discard the extra information pushed on to Bee's return stack. (Mit's ret is implemented as a C return, so is not affected by this.)

Would it be possible at least to move the complexity from ret, the commoner instruction, to throw, the less common, and in the process remove the disparity between ret, which causes 0 to be returned from catch, and 0 throw, which does not return results from catch? Then the technique described in the first comment would work to catch an arbitrary subroutine, ret would be simplified, and 0 would definitely not be an error, because it would be treated specially by throw.

apt1002 commented 4 years ago

So throw would check if the code is zero, and if so, read the number of return values from the catchers stack and move that number of items before pushing the status code and entering the handler.

This sounds good to me. But it also sounds like returning from a catch is actually a different instruction from both ret and throw.

rrthomas commented 4 years ago

But it also sounds like returning from a catch is actually a different instruction from both ret and throw.

Yes! That's the key, isn't it? So, CATCH_CALL, CATCH_RET, and THROW?

rrthomas commented 4 years ago

One problem: forbidding bivalent ret means creating a new error condition. This seems unfortunate: in Mit's current implementation, the error condition is unnecessary, because returning from a subroutine is the same as returning from a catch. In Bee, I can't see how to detect the error condition without slowing down ret again.

rrthomas commented 4 years ago

Bee's implementation could be made a little neater, if no faster, by having ret compare sp with handler_sp (the head of the linked list of catch handlers; if it is equal, then pop an extra item. Avoids the bit-twiddling of return address for much the same performance.

apt1002 commented 4 years ago

I see. I suppose CATCH will have to leave something on the return stack that will crash RET. For example, you could set the bottom bit of the return address. :-) Something that RET would check for anyway, so that you only have to distinguish the new error case when you're already on an error path.

rrthomas commented 4 years ago

Ah! Nice one. And if you successfully return from a catch with ret on Mit, you get to keep all the, err, unbroken things?

apt1002 commented 4 years ago

It would be better if you get an error, really, and no harder to implement.

rrthomas commented 4 years ago

But it would slow down ret. ret is currently just a C return.

rrthomas commented 4 years ago

(Following discussion offline)

  1. Add a new error condition for RET attempted from CATCH. It could be optional, but Mit can implement it anyway.
  2. On Bee, CATCH sets the bottom bit of the return address. If RET's alignment check on its return address fails, it checks to see if sp equals handler_sp, and if so, diagnoses a ERROR_RET_FROM_CATCH.
  3. On Mit, mit_run diagnoses ERROR_RET_FROM_CATCH if run_inner returns normally.

The rest of the design is as above:

  1. Remove ret's ability to return from a catch.
  2. When status 0 is thrown, results are returned.
  3. When it is desired to catch an arbitrary subroutine, proceed as follows: catch a routine that does: call; pushi_0; throw.
apt1002 commented 4 years ago

I think you said you would make a separate instruction for returning from a catch. But this design looks right, yes.

rrthomas commented 4 years ago

I did say that. That seems as though it would create yet another error condition, though: attempt to throw 0. For if throw 0 were allowed, then it would be impossible for the handler (I think this should be called "catcher" BTW) to know whether there were results under the 0 or not.

rrthomas commented 4 years ago

My analysis was wrong: the change from "allow ret to return from a catch" and "make 0 throw return results" is actually rather harder and more complicating. This is because in the former case, you can only return directly to a catch, that is, from the computation stack directly above it. Thus, copying the results is the same as normal. This is why the Mit implementation was trivial (doesn't matter to Bee, of course: its results are always on top of its data stack).

In the new design, "0 throw" discards Mit's current call stack, so as well as a jmp_buf, run_inner also needs a stack base and pointer for the top-most computation stack in the outer call stack, so that the results can be copied down. In Bee, it would seem necessary to make catch record the current data stack pointer as well as the current return stack pointer, and reset them both on 0 throw. (This is what Forth's CATCH does.)

I am prepared to revisit this in future, but I think for now I will instead simply forbid 0 throw on Mit, and retain ret's ability to return from a catch. Allowing ret to return from one call stack to another is a useful special case; without it, there's no direct support for catch to return results in the case where there's no error.

It would be useful to support also returning results in the case where there is an error. However, it looks to me as though the error mechanism is then conflating two things: Mit/Bee's equivalent of hardware exceptions, which can occur "at any time", and therefore cannot be expected to respect a calling convention (in particular, not having sensible return values, though it would be possible to restore the stack pointer, and have nonsense return values underneath an error code), and non-local return. We'll be revisiting non-local return in any case, so I think that this issue can wait until then.