nba-emu / NanoBoyAdvance

A cycle-accurate Nintendo Game Boy Advance emulator.
GNU General Public License v3.0
955 stars 53 forks source link

Tests when prefetch buffer is full #327

Open alyosha-tas opened 10 months ago

alyosha-tas commented 10 months ago

I started looking more carefully at the prefetcher, as that seems like the most likely thing wrong with Shrek 2 timing. Shrek 2 uses EWRAM a lot, so there is a lot of time for the prefetcher to actually do something.

Here are some basic tests:

https://github.com/alyosha-tas/gba-tests/tree/master/prefetcher

Currently NanoBoyADvance fails a test in the 'prefetcher.gba' test rom. The test it fails tests what happens when the prefetch buffer is full. It seems that for the test to work out, the prefetch unit must wait until it is empty to restart again, and when it does so it uses non-sequential timing. Otherwise you get 51 instead of the required 56.

EDIT: This behaviour actually seems to be important for Shrek 2, as when I implement it I can get much closer to the correct value (25FC vs 2611)

fleroviux commented 10 months ago

Thanks for this! I'll definitely try to work on this soon. I noticed that your readme says Tests prefetcher behaviour when branching to nearby addresses. Did you by chance happen to test what happens when branching to the address that the CPU is would fetch the next opcode from? For example:

b .label
nop
nop
.label:
...

A couple of months I ago I noticed that there can be a penalty related to this, and it to some extent depends on the instructions before the branch. For example any RAM access or internal cycle right before the branch would make the discrepancy disappear. But I couldn't quite get the behavior right back then.

alyosha-tas commented 10 months ago

yes that is what those branch tests do, but I made them not knowing about the buffer full behaviour so they inadvertently rely on it at least in the thumb version. I'll clean them up to only test branch behaviour.

Although, the thumb version passes in NanoBoy Advance currently, which is inconsistent with what I just wrote, so maybe I don't have all the details.

alyosha-tas commented 10 months ago

I made some new tests that isolate branching (in thumb mode) but I didn't see any odd behaviour and NanoBoyAdvance currently passes those tests.

Do you have any more details about when you saw weird results with branching?

fleroviux commented 10 months ago

I can check my notes and test ROM tomorrow or on the weekend. It's been a while and I don't remember the details anymore.

RetroEdit commented 10 months ago

Based on above discussion, the Shrek 2 timing issue is probably related/dependent on this behavior: #312

alyosha-tas commented 10 months ago

Following up about the nearby branching thing. There is an unintuitive behaviour when branching 2 instructions away as in the above code, but only when the prefetch buffer is empty.

In this case, the prefetch address and the instruction address the cpu is trying to fetch are the same, and both start reading it at the same time (prefetcher with sequential accesses and cpu with non-sequential since it just branched.) In this case non-sequential timing is used.

This is tested in 'prefetcher_branch_thumb_2.gba' in my repository. NanoBoyAdvance currently fails the test.

In fact this quirk is important for Metroid Fusion, which does a lot of such branches.