m68k: need DA regs save/restore for 68010/68020 custom MMU support

agentbooth commented 2 years ago

The AT&T UNIX PC driver (unixpc.cpp), using a 68010 with custom MMU, requires support for page faults which trigger a bus error, that requires the DA regs to be saved (prior to the PF) and restored prior to the exception handling.

This feature is currently supported for the PMMU implementation of 68851/68030/68040, but needs to be extended to cover 68010 and 68020 using custom or 68451 MMU.

Current behavior: m_maincpu->set_input_line(M68K_LINE_BUSERROR, ASSERT_LINE); calls m68k_cause_bus_error()which does: m68ki_init_exception() m68ki_stack_frame_1000() // 68010 specific m68ki_jump_vector(EXCEPTION_BUS_ERROR) however, the DA regs are not (saved and) restored prior to the exception handling.

execute_run() in m68kcpu.cpp covers this situation for m_pmmu_enabled, but not for 68010 or 68020 with custom MMU.

Comment from @rb6502:

For the case where you need to restart instructions from an external MMU, m68k execute_run() needs a separate flag m_has_instruction_restart which all of the MMU configurations set and which is also settable externally so it can be enabled for 68010 systems with external MMUs.

This is also potentially relevant for:

Convergent MiniFrame (68010, custom MMU)
Sun-2 (68010, custom MMU)
Sun-3 (68020, custom MMU)
HP 9000/310 (68010, custom MMU)
HP 9000/320 (68020, custom MMU)
MVME 120 (68010, 68451 MMU)

jaggies commented 2 years ago

I'm seeing this issue implementing the Integrated Solutions 68kV20/25 BSD system.

Among other things, it appears that writes that cause page faults get dropped.

It would be great if someone has a patch I can test. Or some more details on what needs to be implemented.

balr0g commented 2 years ago

Note that the MC68010 and later members of the family perform instruction continuation or resumption, not restart — when a page fault occurs, the sub-instruction state is saved on the supervisor stack and after page fault handling, the instruction resumes where it left off.

This is not fully emulated in MAME or most emulators, and lack of correct emulation may have effects for certain operating systems. It it briefly covered on page 1-6 of http://www.bitsavers.org/components/motorola/68000/MC68010_68012_Data_Sheet_May85.pdf.

agentbooth commented 2 years ago

MC68010 and later members of the family perform instruction continuation or resumption, not restart

I updated the title and description to reflect the need for the DA reg restore before the exception handling as I believe this is the crux of the issue.

This is not fully emulated in MAME or most emulators

The original handling in Musashi, where DA regs are saved and restored for the 68010 (and how MAME handles PMMU support), does allow for proper page faulting in the UNIX PC, as seen in https://github.com/philpem/freebee.

One update that may be needed for other 68010 MMU systems is the proper setting of the fault address in the 68010 stack frame, which is currently set to zero: m68kcpu.h: m68ki_stack_frame_1000(u32 pc, u32 sr, u32 vector) ... /* FAULT ADDRESS */ m68ki_push_32(0);

smf- commented 2 years ago

This is the original musashi linked & it's essentially the same as ours.

https://github.com/kstenerud/Musashi/blob/d0ab9ace32ac9956315d4457715158f4aecb3368/m68kcpu.h#L1666

static inline void m68ki_stack_frame_1000(uint pc, uint sr, uint vector) { / VERSION NUMBER INTERNAL INFORMATION, 16 WORDS / m68ki_fake_push_32(); m68ki_fake_push_32(); m68ki_fake_push_32(); m68ki_fake_push_32(); m68ki_fake_push_32(); m68ki_fake_push_32(); m68ki_fake_push_32(); m68ki_fake_push_32();

https://github.com/kstenerud/Musashi/blob/fc7a6fc602e2fbcd24851670a5242358765feacf/m68k_in.c#L9238

} else if (format_word == 8) { / Format 8 stack frame -- 68010 only. 29 word bus/address error / new_sr = m68ki_pull_16(); new_pc = m68ki_pull_32(); m68ki_fake_pull_16(); / format word / m68ki_fake_pull_16(); / special status word / m68ki_fake_pull_32(); / fault address / m68ki_fake_pull_16(); / unused/reserved / m68ki_fake_pull_16(); / data output buffer / m68ki_fake_pull_16(); / unused/reserved / m68ki_fake_pull_16(); / data input buffer / m68ki_fake_pull_16(); / unused/reserved / m68ki_fake_pull_16(); / instruction input buffer / m68ki_fake_pull_32(); / internal information, 16 words / m68ki_fake_pull_32(); / (actually, we use 8 DWORDs) / m68ki_fake_pull_32(); m68ki_fake_pull_32(); m68ki_fake_pull_32(); m68ki_fake_pull_32(); m68ki_fake_pull_32(); m68ki_fake_pull_32(); m68ki_jump(new_pc); m68ki_set_sr(new_sr); CPU_INSTR_MODE = INSTRUCTION_YES; CPU_RUN_MODE = RUN_MODE_NORMAL;

On 18/01/2022 00:41, Jesse Booth wrote:

MC68010 and later members of the family perform instruction
continuation or resumption, not restart
I updated the title and description to reflect the need for the DA reg restore before the exception handling as I believe this is the crux of the issue.
This is not fully emulated in MAME or most emulators
The original handling in Musashi, where DA regs are saved and restored for the 68010 (and how MAME handles PMMU support), does allow for proper page faulting in the UNIX PC, as seen in https://github.com/philpem/freebee.

One update that may be needed for other 68010 MMU systems is the proper setting of the fault address in the 68010 stack frame, which is currently set to zero: m68kcpu.h: |m68ki_stack_frame_1000(u32 pc, u32 sr, u32 vector)| ... |/ FAULT ADDRESS / m68ki_push_32(0);|

— Reply to this email directly, view it on GitHub https://github.com/mamedev/mame/issues/9170#issuecomment-1014976034, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACGVYYTOUKPDQGHNJMAP3MLUWSZKRANCNFSM5MFOJLKA. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

agentbooth commented 2 years ago

Yes, the stack frame it setup fine. The issue is about the DA regs being saved: https://github.com/kstenerud/Musashi/blob/fc7a6fc602e2fbcd24851670a5242358765feacf/m68kcpu.c#L985

        /* Record previous D/A register state (in case of bus error) */
        for (i = 15; i >= 0; i--){
            REG_DA_SAVE[i] = REG_DA[i];
        }

        /* Read an instruction and call its handler */
        REG_IR = m68ki_read_imm_16();
        m68ki_instruction_jump_table[REG_IR]();
        USE_CYCLES(CYC_INSTRUCTION[REG_IR]);

And then restored in a bus error exception: https://github.com/kstenerud/Musashi/blob/fc7a6fc602e2fbcd24851670a5242358765feacf/m68kcpu.h#L1934

for (i = 15; i >= 0; i--){
    REG_DA[i] = REG_DA_SAVE[i];
}

needed for the scenario of a 68010 or 68020 using a custom MMU. Currently that only occurs for pmmu_enabled: https://github.com/mamedev/mame/blob/023b854a88255f7362ea915e93a4bad9a5a719fa/src/devices/cpu/m68000/m68kcpu.cpp#L904

            for (i = 15; i >= 0; i--)
            {
                tmp_dar[i] = REG_DA()[i];
            }

smf- commented 2 years ago

Saving and restoring the DA is a hack to work round the lack of instruction restart.

The issue is that code should not exist at all.

It can be removed once the stack frames contain the correct internal state and the cpu core modified to work with it.

I suspect that we will need to wait for someone to reverse engineer a 68010 die shot as nobody has seemed to want to do it by observing a real 68010.

On 19/01/2022 02:52, Jesse Booth wrote:

Yes, the stack frame it setup fine. The issue is about the DA regs being saved: Message ID: @.***>

galibert commented 2 years ago

If there are microcode addresses in the stack frame, and I don't see how there couldn't be, exact contents is a long way away...

OG.

On Wed, Jan 19, 2022 at 11:57 AM smf- @.***> wrote:

Saving and restoring the DA is a hack to work round the lack of instruction restart.

The issue is that code should not exist at all.

It can be removed once the stack frames contain the correct internal state and the cpu core modified to work with it.

I suspect that we will need to wait for someone to reverse engineer a 68010 die shot as nobody has seemed to want to do it by observing a real 68010.

On 19/01/2022 02:52, Jesse Booth wrote:

Yes, the stack frame it setup fine. The issue is about the DA regs being saved: Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/mamedev/mame/issues/9170#issuecomment-1016328886, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACGSF4J5DOFL5S5YR3PPCI3UW2KKRANCNFSM5MFOJLKA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

agentbooth commented 2 years ago

Saving and restoring the DA is a hack to work round the lack of instruction restart.

Even though this is a hack, it should be utilized in the case of a bus error with a custom MMU with a 68010/020, correct? It currently is not which is what I’m hoping to get supported.

smf- commented 2 years ago

There aren't that many instructions, if you could trigger a bus error on each of the access then you'd get all the stack frames.

Some of the words are going to be data values, some are going to be microcode etc.

You could fuzz with different data values to figure out what was data and what was internal state.

On 19/01/2022 11:25, Olivier Galibert wrote:

If there are microcode addresses in the stack frame, and I don't see how there couldn't be, exact contents is a long way away...Message ID: @.***>

smf- commented 2 years ago

The code is a mess & the hack doesn't really work properly.

Imagine what will happen if you move EA, EA and the first EA is an io chip that implements a fifo and the second EA is a paged out virtual address.

If you want it supported then you might have to figure out how to do it yourself.

On 19/01/2022 15:45, Jesse Booth wrote:

Even though this is a hack, it should be utilized in the case of a custom MMU with a 68010/020, correct? It currently is not which is what I’m hoping to get supported.

— Reply to this email directly, view it on GitHub https://github.com/mamedev/mame/issues/9170#issuecomment-1016598345, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACGVYYQ4ZXVT5VFOKGA2YKDUW3MCJANCNFSM5MFOJLKA. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

agentbooth commented 2 years ago

The code is a mess & the hack doesn't really work properly.

The hack (in original Musashi) works fine for the purposes of emulating the AT&T UNIX PC. Longer term, other machines may indeed need the proper instruction suspend and continuation. At this point, I am just looking to have the same functionality (save/restore regs on bus error) in MAME, which is currently only supported for pmmu_enabled.

rb6502 commented 2 years ago

The hack currently works fine with half a dozen flavors of UNIX plus the classic MacOS with virtual memory paging enabled. I am well aware that there are instruction combinations that could cause it not to work (e.g. a page fault involving a FIFO or something) but it seems like proper instruction restart is a ways off as OG says. (Since the restart info is undocumented, if we had a cycle-by-cycle core we could just store what we needed and it'd work fine, but we don't have that either). I'll make the changes for this soon.

agentbooth commented 2 years ago

Thanks @rb6502 I appreciate it.

jaggies commented 2 years ago

It would be great if something can be done here until a proper 68010/20 fix can be made.

As it stands, the 68010/020 BSD machines I'm emulating can't spawn init because the first write to a page fails to recover after the fault has been cleared since the instruction doesn't continue.

Though conceivable, the code in my case doesn't appear to map hardware devices. They're all in flat memory space. So it should be safe from the FIFO example above.

BTW, is it enough to bypass the "!mmu_enabled" code in execute_run() to experiment with this? Or is something else missing?

pmackinlay commented 2 years ago

@jaggies it might be enough to temporarily bypass the !m_pmmu_enable code for your case. For Sun3, I also needed to add m_mmu_tmp_buserror_occurred++; to m68000_base_device::set_buserror_details() because I used that to signal a translation failure from the external MMU, which has different logic than a bus error generated for other reasons (e.g. invalid memory address). Depending on how your MMU works, you might need to do the same.

jaggies commented 2 years ago

Thanks, Patrick. This is very helpful.

What's the mame-friendly way of modifying m_mmu_tmp_buserror_occurred? I already call set_buserror_details(), so this would probably work great.

Given the unknown of how the actual 68010/20 implement bus error, I'm wondering if it would make sense to abstract the MMU to allow it to be replaced wholesale? Otherwise, these workarounds seem likely to add instability until a proper fix is available. For example sun3 and ISI may do things in an incompatible way.

agentbooth commented 2 years ago

@pmackinlay i like your setting of m_mmu_tmp_buserror_occurred++ in m68000_base_device::set_buserror_details() - I was setting it in m68k_cause_bus_error() and returning immediately.

@jaggies if you are going to try the workaround - if you are using a 68010, you need to change the code in execute_run() to call m68ki_stack_frame_1000 to setup the 68010 specific stack frame.

jaggies commented 2 years ago

Thanks for the pointer. That's super helpful.

Two of the five ISI machines use 68010s. The rest use 68020s/030s. I don't have quite enough information to get the '030 version going. If anyone here has a connection at computinghistory.org.uk, it would be great if you can tag them. I haven't been able to get a response.

jaggies commented 2 years ago

OK, getting closer.

Now it reissues the instruction, but gets stuck because MAME appears to ignore the returned stack frame generated by m68ki_stack_frame_1010().

IOW, the ISI system has a function that intentionally reads a potential bad device address and returns true or false based on whether it succeeds or fails in a bus error. It accomplishes this by clearing the 'rerun' flag in the 1010 stack frame generated here:

        /* SPECIAL STATUS REGISTER */
    // set bit for: Rerun Faulted bus Cycle, or run pending prefetch
    // set FC
    m68ki_push_16(0x0100 | orig_fc | orig_rw<<6 | orig_sz<<4);

Even after clearing the stacked value, it appears the instruction is always executed - leading to an infinite loop.

Is modifying the stacked value of the 'rerun' flag (0x100) expected to work? I couldn't find reference to it in the rest of the MAME 68k code.

pmackinlay commented 2 years ago

Unfortunately, as you’re discovering, MAME’s m68k core is very very simplistic when it comes to these exception frames – they’re mostly “write only” and very limited at that.

If you’ve modified set_buserror_details() as I suggested, you now basically have two ways to signal a bus error to the 68k:

set_buserror_details(): this signals an MMU error from which the instruction should be restarted
set_input_line(M68K_LINE_BUSERROR, 1): this signals a bus error from which the instruction should not be restarted

As you’ve discovered, the actual stack frames produced aren’t accurate, and their content is not considered by the 68k core when it does the return, so anything that’s fiddling with them and expecting to see those effects won’t work.

You might be able to solve your current issue by using the set_input_line() type of bus error for the probe() and using the set_buserror_details() only for MMU faults (which is what I’m doing with Sun3 which has the same basic requirements).

jaggies commented 2 years ago

Thanks. I think this will allow me to get something going. With the above change, it appears copy-on-write is now working.

I still need to hunt for other places where the stack frame state is modified / observed by ISI code. But at least this provides the means to control whether the fault instruction is reissued.

It's unfortunate that the stack frame is ignored. There seems to be just enough documentation to get that going, "for reals." Otherwise, I don't see a path forward to checking this back into MAME, as it requires modifying the 68k core in, perhaps, an incompatible way.

rb6502 commented 2 years ago

@pmackinlay Can we get whatever you changed in the m68k submitted? I'd rather not reinvent any wheels and then have it break Sun3.

agentbooth commented 2 years ago

After much investigation into a kernel panic in the UNIX PC driver, I believe I found the issue. The RTE instruction unwinding the bus error exception stack frame would corrupt the SP when transitioning from supervisor mode back to user mode. This would be encountered running code in user mode that triggered a bus error (e.g. page fault). This issue is specific to 68010.

I also checked in the 68010 bus error stack frame setup in m68kcpu.cpp (execute_run()). However we still need to come up with a method for 68010 with custom MMU to trigger bus errors. I am currently using the workaround of setting m_mmu_tmp_buserror_occurred in m68000_base_device::set_buserror_details()

agentbooth commented 2 years ago

@pmackinlay @jaggies Here are my proposed changes. Please confirm this works for you.

@rb6502 please let me know if this seems fine to you:

m68000.h:

        void set_hmmu_enable(int enable);
+       void set_emmu_enable(int enable);
        int get_pmmu_enable() const {return m_pmmu_enabled;}
        void set_fpu_enable(int enable);
-       void set_buserror_details(u32 fault_addr, u8 rw, u8 fc);
+       void set_buserror_details(u32 fault_addr, u8 rw, u8 fc, bool rerun = false);

m68kcpu.cpp: in execute_run():

-                       if (!m_pmmu_enabled)
+                       if (!m_pmmu_enabled && !m_emmu_enabled)

-void m68000_base_device::set_buserror_details(u32 fault_addr, u8 rw, u8 fc)
+// rerun = trigger bus error and rerun instruction after RTE (do not also set_input_line(M68K_LINE_BUSERROR) when using this)
+//         intended for external/custom MMU use
+void m68000_base_device::set_buserror_details(u32 fault_addr, u8 rw, u8 fc, bool rerun)
 {
+       if (m_emmu_enabled && rerun) m_mmu_tmp_buserror_occurred = 1; // hack for external/custom MMU
        m_aerr_address = fault_addr;
        m_aerr_write_mode = (rw << 4);
        m_aerr_fc = fc;
        m_mmu_tmp_buserror_address = fault_addr; // Hack for x68030
+       m_mmu_tmp_buserror_rw = rw;
+       m_mmu_tmp_buserror_fc = fc;

In the UNIX PC driver, in ::machine_start() I set: m_maincpu->set_emmu_enable(1);

Trigger bus error with instruction rerun: m_maincpu->set_buserror_details(fault_addr, rw, m_maincpu->get_fc(), true); no set_input_line() call

Trigger bus error without rerun, as is currently done :

m_maincpu->set_buserror_details(fault_addr, rw, m_maincpu->get_fc());
m_maincpu->set_input_line(M68K_LINE_BUSERROR, ASSERT_LINE);
m_maincpu->set_input_line(M68K_LINE_BUSERROR, CLEAR_LINE);

(This proposal does not address RTE not respecting code having changed the RR bit is the Special Status word in the stack frame to NOT rerun the instruction.)

jaggies commented 2 years ago

Thanks. That looks like it will work. Will give it a try shortly. Curious about adding the rw/fc preserve flags. Seems like that might have caused some chaos :)

I am seeing some weird issues compiling gcc with local variables/stack getting segv. It's one of those "printf fixes it" kind of bugs. Not sure if that's just gcc misbehaving, bug(s) with the native compiler, or something deeper still.

agentbooth commented 2 years ago

Curious about adding the rw/fc preserve flags. Seems like that might have caused some chaos :)

Yeah that might be wrong. m68k_cause_bus_error() (called from the set_input_line() method) and pmmu_set_buserror() do:

        m_mmu_tmp_buserror_rw = m_mmu_tmp_rw;
        m_mmu_tmp_buserror_fc = m_mmu_tmp_fc;
        m_mmu_tmp_buserror_sz = m_mmu_tmp_sz;

That seems like a better thing to do. I'm not sure why fc and rw are even sent into set_buserror_details() -- it looks like for creating a bus error frame specifically for the 68000.

I was looking to set them here so that the fc and rw bits could be set in the 68010 stack frame.

rb6502 commented 2 years ago

Yeah, I'd have set_buserror_details() set the m_mmutmp variables and that way you still set the bus error line for both kinds of errors, which is more correct to how the hardware works.

Also, if you want to eliminate the fc parameter to set_buserror_details() that'd be great. I understand if you don't want to go after the 10 or 15 places it's used though.

Oh, and I think I mentioned this before, but ideally we want m68k_execute() to only need a single variable check, so add a variable that's pmmu_enabled and emmu_enabled ORed together.

agentbooth commented 2 years ago

I tried setting m_mmu_tmp_buserror_occurred in set_buserror_details() and then setting the bus error line but this resulted in two exception stack frames being built -- one in execute_run() and then one in cause_bus_error. And I couldn't check for m_mmu_tmp_buserror_occurred and just return in cause_bus_error as it's already unset (in execute_run) by the time cause_bus_error is called. At least that was my interpretation of what was going on.

fc and rw coming in to set_buserror_details looks to be used for the 68000 bus error so not sure how to eliminate that exactly. It's a bit misleading sending in fc and rw as they aren't used for 68010+.

Ok, maybe i can make that m_has_instruction_restart you originally mentioned. Feel free to jump in on any of this if you want to refactor. The 68020+ handling, HMMU, PMMU, isn't something I've been looking into the details so I'm pretty hazy on what all goes on there.

agentbooth commented 2 years ago

@pmackinlay @jaggies, a solution has been merged https://github.com/mamedev/mame/commit/93ac5f268b7804fcccc02ebbd9ca0d131a9295e9 by @rb6502, support can be implemented by:

Setup

In ::machine_start() set: m_maincpu->set_emmu_enable(1);

Trigger bus error with instruction rerun:

m_maincpu->set_buserror_details(fault_addr, rw, m_maincpu->get_fc(), true);

do NOT call set_input_line()

Trigger bus error without rerun (as previously done):

m_maincpu->set_buserror_details(fault_addr, rw, m_maincpu->get_fc());
m_maincpu->set_input_line(M68K_LINE_BUSERROR, ASSERT_LINE);
m_maincpu->set_input_line(M68K_LINE_BUSERROR, CLEAR_LINE);

(This does not address RTE not respecting code having changed the RR bit is the Special Status word in the stack frame to NOT rerun the instruction.)

mamedev / mame