n64dev / cen64

Cycle-Accurate Nintendo 64 Emulator
BSD 3-Clause "New" or "Revised" License
803 stars 70 forks source link

Linux instability #182

Open clbr opened 3 years ago

clbr commented 3 years ago

When running my port of Linux on (patched) cen64, it's unstable in ways that real hw is not. Very hard to track down random hangs that don't happen on hw, and there's a small chance the patches are at fault, but now that it's ready, others can try too.

I have a suspicion the TLB logic contains some more bugs, given how many I already found there, but there could be others too.

https://github.com/clbr/n64bootloader/releases

I have the following patches applied to cen64 currently. I'll be submitting PRs as the old ones get reviewed.

tj90241 commented 3 years ago

Yeah, finding the last hidden issues is quite the endeavor.

Sometime in 2015-2016 when I was actively working on this code, I had the VR4300 component isolated and booting a Linux kernel all the way to initrd loading, and that was successful in fuzzing out some CP0 issues. I broke the TLB valid check (the issue you found) after that particular fuzzing endeavor. I'm surprised the TLB mod exception issue never turned up, though.. that's a new one.

Because the code models the pipeline and cache to a point where you can almost write synthesizable logic around it, there is also always the possibility that it may also be a bug with an instruction not getting squashed correctly or something. This particular case works, I think, but as for an example of how this gets tricky:

lw $at, $s0  # assume this raises an exception
tlbwi  # this instruction must be squashed in the pipeline

... but TLBWI writes to CP0 while it's in the EX stage: https://github.com/n64dev/cen64/blob/master/vr4300/cp0.c#L267

so we inject a fault when the lw exception is raised in the DC stage and propagate it along: https://github.com/n64dev/cen64/blob/master/vr4300/fault.c#L50

which is used on the next cycle to prevent the EX stage from executing: https://github.com/n64dev/cen64/blob/master/vr4300/pipeline.c#L437

tj90241 commented 3 years ago

Some gotchas with sign extension too, here's another one I found during my initial Linux fuzzing: https://github.com/n64dev/cen64/commit/9d9655cf62dbf79c730336691825993ca860a0bc

bryanperris commented 3 years ago

I am wondering why cen64 always shifts the virtual address by 13, when the MIPS docs says to shift the offset off based upon the pagemask register. When I do the math, shifting by 13 seems to give me the correct VPN2 value to do the search on while shifting by the page size (16 bits) will give me a value way too small. I know shift by 13 bits works for EntryHi.

https://github.com/n64dev/cen64/blob/a109ac02de2b4c61901db2a3ca0a3d25388609ed/arch/x86_64/tlb/tlb.c#L32

tj90241 commented 3 years ago

@bryanperris That's just an optimization-related thing. x86 SSE encoding does not allow variable-length shifts (it must be a constant coded into the instruction word). So, instead, we say "let's just shift off what we know will be an offset into the page (4k pages, 2 pages per TLB = 13 bits) and then AND off dynamically to workaround the fact we cannot shift dynamically. This is what check_l = _mm_and_si128(vpn, page_mask_l); is accomplishing. So, ultimately, the comparison (check_l = _mm_cmpeq_epi32(check_l, vpn_l);) is done with regards to pagemask still.

bryanperris commented 3 years ago

@tj90241 Thanks, that makes sense now. In the case of 4K pages, why shift off the 13th bit when the mask for offset is 0xFFF? Is that to apply the divide by 2 for the VPN?

tj90241 commented 3 years ago

Correct, it's because in MIPS the smallest page size is 4k (12 bits), and each physical TLB entry provides a mapping for 2 pages ("VPN2"), which is where the 13th bit comes in. The SSE lookup is just trying to find the "VPN2" entry in hardware -- once the DC stage has a hit, it will use the full address (again) to determine if EntryLo/EntryHi matches, etc.

bryanperris commented 3 years ago

Looking at your pipeline code, it calls the tlb_probe function to find the index of the matching entry. Does cen64 only handle 4K pages?

tj90241 commented 3 years ago

Right - tlb_probe is only responsible for finding the hardware entry. Then the pipeline uses the attributes of that entry to select the right page/etc.:

      tlb_miss = tlb_probe(&vr4300->cp0.tlb, vaddr, asid, &index);
      page_mask = vr4300->cp0.page_mask[index];
      select = ((page_mask + 1) & vaddr) != 0;
...
      cached = ((vr4300->cp0.state[index][select] & 0x38) != 0x10);
      paddr = (vr4300->cp0.pfn[index][select]) | (vaddr & page_mask);
awsms commented 3 months ago

Has anyone been able to compile it on Linux? Even the debian build task fails on Github

clbr commented 3 months ago

awsms, this report is about running Linux on cen64. If you're trying to compile cen64 on Linux, please open a new one.