stub: zero stalled stack and flags for fn 300h

stsp commented 2 years ago

DPMI fn 300h calls realmode interrupt. It takes the registers structure in es:edi. If ss and sp in that structure are cleared, then DPMI provides host's internal stack. DPMI then puts the registers at return, to the same address, and that may include realmode ss/sp that DPMI host provided for that call.

Unfortunately ss/sp were cleared only on first invocation. As the result, subsequent invocations were putting the stack to the same location as before, which is the DPMI host's private location and should never be requested by client, even if in most cases it doesn't change between calls.

This patch makes sure ss/sp and flags are always zeroed. This is similar to how __dpmi_int() API works: it zeroes ss/sp and flags.

stsp commented 2 years ago

Hi.

I have problems testing that, as on my djgpp build it just exceeds the maximum stub size... So I need your build before I can test. :)

jwt27 commented 2 years ago

Hi, sorry, I've been busy with other stuff lately.

Clearing ss/sp is technically more correct, but there are no calls to other dpmi functions inbetween so I don't see how the stack could possibly be invalidated. Unless the host allocates a new RM stack for each call, is that the case for yours?

I have problems testing that, as on my djgpp build it just exceeds the maximum stub size...

That's odd. You are using binutils 2.35+ right? There should only be a stub size limit for 2.34 and earlier.

stsp commented 2 years ago

When ss/sp set to 0, host allocates the stack. And it should do that for each call, which needs ss/sp clearing for each call. If you allocate your own stack you set ss/sp to that one, but ss/sp should never point to host's stack. In our case it does, and as a work-around I modified the host to not save ss/sp values.

I'll look at updating binutils for my build.

jwt27 commented 2 years ago

This is with the dpmi host in dosemu2? To my knowledge both hdpmi and cwsdpmi use a single RM stack, nothing is allocated on functions ax=030x. So ss/sp remain valid between calls, unless ax=030x is somehow invoked recursively with the same regs. But in the case of this stub, that can never happen.

stsp commented 2 years ago

Yes, with this stub this can never happen. But I added the sanity check that ss/sp never points to host's internal stack, and it triggered this. I did that because there was a crash in another program with similar problem, which actually did recursive invocations. So while not a real bug here, I decided it might be good to have "fixed".

stsp commented 2 years ago

And the work-around I mentioned, was not for this stub but for another prog, yes.

jwt27 commented 2 years ago

Even with ss/sp clear, if ax=0300 is called recursively then the same stack will still be used right? So that is clearly a bug in that other program. Maybe in dosemu2 you could detect recursive stack usage and allocate a new stack in this case? How does hdpmi handle this?

I don't really see the need to patch this here, it would only increase the stub size by 25% with no tangible benefit.

stsp commented 2 years ago

Even with ss/sp clear, if ax=0300 is called recursively then the same stack will still be used right?

No, not right. Why do you think so? 0x300 can point to realmode callback, which enters PM. The pm handler can then call other ins via 0x300, and be it the same stack, there would be a corruption.

Maybe in dosemu2 you could detect recursive stack usage and allocate a new stack in this case?

That's the point. It of course provides another stack for every new nesting level. But if you don't clear ss/sp by hands, then the old one is used, hence the corruption. In theory the recursive invokes can happen even with this stub - if someone reflects the ints that it calls, to prot mode. This is very unlikely but possible.

How does hdpmi handle this?

By not saving ss/sp into user's struct, at least this is what hdpmi's author told me.

size by 25% with no tangible benefit.

25% with 4 instructions?

jwt27 commented 2 years ago

So when ss/sp is clear, do you re-use the incoming ss/sp from the last rm->pm switch? I think that is what hdpmi does (not sure). I see cwsdpmi allocates a 256 byte stack for each call.

By not saving ss/sp into user's struct, at least this is what hdpmi's author told me.

Ok, that seems a sensible solution then if a new stack is provided on each call.

25% with 4 instructions?

The stub size is rounded to a multiple of 512 bytes (disk sector). It's also not upx-compressable. So yes, seems a bit wasteful.

stsp commented 2 years ago

So when ss/sp is clear, do you re-use the incoming ss/sp from the last rm->pm switch?

If ss/sp is clear then there is nothing to re-use because it is clear. In that case host provides its internal stack, that is usually similar for the same nesting level. But its not guaranteed by the spec and moreover someone could hook the int that is called via 0x300 to PM, causing nested invokes.

Ok, that seems a sensible solution then if a new stack is provided on each call.

This is a work-around that is not backed by the spec. The spec doesn't say that some registers are not saved to the struct. I don't think all available DPMI servers avoid saving ss/sp to the struct.

The stub size is rounded to a multiple of 512 bytes (disk sector). It's also

Have I crossed the margin again? IIRC last time the margin was already crossed by some other additions of mine. How could it reach the new margin since?

jwt27 commented 2 years ago

This is a work-around that is not backed by the spec. The spec doesn't say that some registers are not saved to the struct. I don't think all available DPMI servers avoid saving ss/sp to the struct.

I checked, and the DPMI 1.0 spec does actually require that SS/SP are not saved: http://sudleyplace.com/dpmione/dpmispec1.0.pdf (page 17)

The DPMI host saves the protected mode registers, switches the CPU into real mode, loads the registers from the data structure, and then transfers control to the designated address. Parameters may also be passed to the real mode procedure on the stack, if necessary. When the real mode procedure returns, the host stores the (possibly modified) register contents (except for SS, SP, CS, and IP) back into the same data structure, switches the CPU to protected mode, restores the protected mode registers, and resumes execution of the protected mode client. The client can then inspect the data structure to determine the results of the function call.

Don't know if that is the same in 0.9, but it seems sensible that it is.

Have I crossed the margin again? IIRC last time the margin was already crossed by some other additions of mine. How could it reach the new margin since?

I recall you shortened some string to squeeze it exactly in the 2048 byte boundary. So now even a single instruction will cross it.

jwt27 commented 2 years ago

DPMI 0.9 also requires it: https://www.phatcode.net/res/262/files/dpmi09.txt

After the call or interrupt is complete, all real mode registers and flags except SS, SP, CS, and IP will be copied back to the real mode call structure so that the caller can examine the real mode return values.

So the current stub code is correct, and any dpmi host that does save ss/sp is in violation of the specs :)

I think in my own code I also reset ss/sp for each call, so I guess it's safe to remove that and save a few cpu cycles.

jwt27 / djgpp-cvs

stub: zero stalled stack and flags for fn 300h #5