hercules-390 / hyperion

Hercules 390
Other
246 stars 69 forks source link

Missing interval timer/CPU timer interruptions with ECPS:VM active #214

Closed wably closed 7 years ago

wably commented 7 years ago

It was recently reported to me that the CPWATCH utility virtual machine was abending with a PIC 9 when connecting to CPWATCH after it had some eight-plus hours of inactivity. It was reported that it had something to do with the DISP0 assist of ECPS:VM. When DISP0 was disabled, CPWATCH operated properly. I was able to recreate and confirm these findings.

To boil down days of investigation and research, it turns out that this problem is related to the prior issue #193 which has been fixed and closed. In that prior issue, which was a case of a virtual machine being dispatched as runable even though the virtual PSW wait bit was set, I mentioned that it was odd that the VMPSWAIT dispatchability flag had not been set in the VMBLOK even though the virtual machine was clearly in a wait.

As a result of the investigation into the CPWATCH issue, I found that the code for DISP0 is completely missing a statement which would set the VMPSWAIT bit. This bit is clearly set in the corresponding code in DMKDSP. Setting this bit in DISP0 resolves the issue with CPWATCH and no interruptions go missing. By not setting the bit, later execution by the scheduler (DMKSCH) and then DMKDSP2 (or DISP2 assist) would later attempt to dispatch the virtual machine even though it was in a wait. The fix to #193 allowed DISP2 to bail out of an actual dispatch but this whole sequence of events meant that CP was not able to reschedule a new TRQBLOK representing the next timer interruption. Thus, the timer interruption the virtual machine was waiting on would never come and was "missing".

Besides the CPWATCH issue, this could potentially also affect guest operating systems or any other virtual machine whose programming depended on periodic interruptions from one of the timers.

The following code from DISP0 shows the absence of code to set the VMPSWAIT bit. Just before testing the V-PSW for Wait, VMPSWAIT is set off, as is done in the corresponding DMKDSP code. But once determining that the V-PSW has Wait set, the VMPSWAIT flag is not set back on, even though DMKDSP (further below) does so.

/* DMKDSP - CKWAIT */
/* Clear Wait / Idle bits in VMRSTAT */
B_VMRSTAT=EVM_IC(vmb+VMRSTAT);
B_VMRSTAT &= ~(VMPSWAIT | VMIDLE);
EVM_STC(B_VMRSTAT,vmb+VMRSTAT);
if(F_VMPSWHI & 0x00020000)
{
    DEBUG_CPASSISTX(DISP0,WRMSG(HHC90000, "D", "DISP0 : VWAIT - Taking exit #28"));
    /* Take exit 28  */
    regs->GR_L(11)=vmb;
    UPD_PSW_IA(regs, EVM_L(elist+28));   /* Exit +28 */
    CPASSIST_HIT(DISP0);
    EVM_ST(DISPCNT,dlist);
    return;
}

Here is the corresponding code from DMKDSP:

    CKWAIT   EQU   *  CHECK HERE FOR DISABLED OR IDLE WAIT STATES: %V3M4038
         NI    VMRSTAT,X'FF'-(VMPSWAIT+VMIDLE)   UNFLAG WAIT   %V3M4038
         TM    VMPSW+1,WAIT   STILL IN WAIT ??                 %V3M4038
         BZ    DISPATCH       NO -- GO DISPATCH                %V3M4038
         OI    VMRSTAT,VMPSWAIT   FLAG IN WAIT                 %V3M4038

Therefore, the solution is easy. Set the VMPSWAIT bit within the same if-block shown above that is executed when the V-PSW wait bit is set. The newly inserted statements are the last two statements shown:

if(F_VMPSWHI & 0x00020000)
{
    DEBUG_CPASSISTX(DISP0,WRMSG(HHC90000, "D", "DISP0 : VWAIT - Taking exit #28"));
    /* Take exit 28  */
    B_VMRSTAT |= VMPSWAIT;
    EVM_STC(B_VMRSTAT,vmb+VMRSTAT);

This resolves the issue with CPWATCH. The final solution in the next commit to ecpsvm.c may take a different form, including possibly reworking the solution to #193. But for now, this solution will work perfectly.

I will close this issue when I next commit ecpsvm.c.

PeterCoghlan commented 7 years ago

I've had CPWATCH running for over 20 hours now with this fix in place and I've seen no sign of any problems. I think it's nailed.

I want to thank Bob and Ivan for all the work they have put into ECPS:VM which makes VM/370 so much zippier on my machines which are nearly as old as the S/370 machines they are emulating.

wably commented 7 years ago

Closing; this issue is resolved in commit 3d97e8a .