jpd002 / Play-

Play! - PlayStation2 Emulator
http://purei.org
Other
2.04k stars 248 forks source link

Maintain a set of hardware tests #36

Closed unknownbrackets closed 9 years ago

unknownbrackets commented 9 years ago

I've found hardware tests to be the most efficient and effective method of improving compatibility. I've used them almost religiously for the vast majority of PPSSPP's HLE that I've worked on, and for a lot of the jit and etc.

For an example, the vmax SIMD function on a PSP (custom SIMD unit) considers -NAN to be less than -INF. It turns out that proper NAN/INF handling was important for a few games. This would've been pretty hard to guess without hardware tests (although it makes sense when you think about it.)

Such tests can also be used as regression tests, which is also hugely useful.

So, I think hardware tests would be very useful. Specifically:

Presumably ps2link could be used to execute things on the PS2.

I don't know how to run homebrew yet on my PS2, there seems to be lots of methods that seem tricky online...

-[Unknown]

pinguallyj commented 9 years ago

I wish that play will be like ppsspp, really fast and compatible.

unknownbrackets commented 9 years ago

ps2link seems to work fairly well and easily, seems like the ps2sdk is not too complex to setup either. Probably easiest to create tests if you have a network kit and ps3 usb memory card adapter (I happen to have both.)

-[Unknown]

jpd002 commented 9 years ago

I really like that idea! I think it would help a lot for the development of the emulator! I've taken a look at pspautotests and it's really nice to see how much is covered by the tests.

Building such a test suite for the PS2 would be great. I'll take a deeper look at pspautotests to see how it's done and see how we could apply that to the PS2. I'll need to get myself up to date on running PS2 homebrew because I haven't done that for a long time... but ps2link looks promising.

unknownbrackets commented 9 years ago

Here's what I did so far:

  1. Setup Free McBoot on my memory card with my ps3 memcard adapter (ps3mca-tool.) Just needed 1MB free or so, didn't need to format.
  2. Put ps2link on a usb stick (probably can go on the mc itself, actually.)
  3. Configured the ip address info on the usb stick in the ps2link directory.
  4. Started my fat ps2 (which has an hdd+net kit) with the mc and usb, no game.
  5. Selected the elf loader and then ps2link.
  6. Used ps2client on my pc to connect to it and run Geotron from my pc.

This was almost as easy as setting up a PSP with psplink. I didn't fully set up the ps2sdk yet, but it seems like I just need the right things on my path and a few environment variables.

A slim ps2 should be just as easy, since it has ethernet built-in.

So the next question is how to collect output and how tests quit / return to loader. Then a small testing interface is needed (to build the tests and provide common output/etc.), and all that will be left is writing the actual tests.

-[Unknown]

unknownbrackets commented 9 years ago

Seems like exiting is a bit of a pain, might just watch for a magic line from the test to make things simple. Since there are other lines written it might make sense to prefix test lines in some special way anyway.

Then it'd be a matter of just storing stdout in CIoman::Write(), or otherwise only writing that out. Ideally in the form of a headless client.

-[Unknown]

jpd002 commented 9 years ago

Would it be possible to use the LoadExecPS2 syscall to go back to the loader on the actual console? Or maybe to load the next test? Were you loading all tests one after another automatically on the console in pspautotools?

unknownbrackets commented 9 years ago

I guess you could have a config file to specify where psplink is (e.g. on usb, memstick, hdd, etc.) Then it could. Otherwise the script executing the test via ps2link could just issue a reset command and ps2link would reset automatically with the correct path, no config needed.

With pspautotests, we run the tests on the psp one by one, because usually you want to run it as you write it. There's not a lot of motivation to run all the tests on hardware at once, since the results are saved and versioned.

However, there's plenty of reason to run them one after another in the emulator, which we do of course do in ppsspp. At one point this was just done by a script, executing headless for each test, but now headless resets itself and runs them all (which is of course faster, and tests resetting per games a bit better.)

-[Unknown]

jpd002 commented 9 years ago

Ah, yeah, kinda makes sense that you wouldn't really need to run all tests on the actual console if the output is saved as the test is being developed and run on the reference hardware.

As for running the tests in the emulator, I guess doing something like MipsTest and VuTest would be the best, except that it would load ELFs and save the output generated by the tests somewhere we could specify in the command line.

unknownbrackets commented 9 years ago

Created a test of the ee's alu instructions: https://github.com/unknownbrackets/ps2autotests/tree/master/tests/cpu/ee

Not super beautiful, but just tried to test $0, jit constant folding, and preservation of upper bits (all of which I can imagine being possible issues, some were at one time or another in ppsspp...)

Yeah, that makes sense. With ppssppheadless, we just only spit out the test output (sent via a special ioctl, in that case) to stdout and then pipe that as we want to for comparison. We also have a --compare mode now that just does the comparison automatically (again for better test performance, since we have quite a few.)

Ideally you want the tests to one day be able to use an offscreen buffer for rendering and save screenshots out, which can also be compared (or read back, at least, which means glReadPIxels or etc. but I assume ps2 games rely upon reading back render results just as psp ones do...) Unfortunately, ppsspp only supports this on Windows currently (mostly because no one has bothered for other platforms), but it has been helpful to test and deal with stencil/alpha behavior, etc. I think the ps2 has similar stencil/alpha overlap but I'm not sure.

-[Unknown]

jpd002 commented 9 years ago

Awesome! I gave a shot at that first test and the emulator did fail at handling the writes to R0 properly... But I fixed the issues and now the output from the emulator is the same as the expected result!

I'll see how I could build a headless version of the emulator that can load many ELFs one after another and compare the generated outputs against the expected outputs. I'll keep in mind that we might want to test video output later on.

unknownbrackets commented 9 years ago

Cool. Yeah, I seem to remember fixing that in ppsspp's alu helped some games, actually. Probably the test could be improved (I just picked a few pairs of values), but alus are mostly pretty simple anyway.

I did start on a branching test but didn't have time to finish it fully. Also need to set up scripts and stuff in that repo. I know on the PSP we saw games do this with the vfpu (similar to vu0 / cop2 I guess):

bvt x, branchx bvt y, branchy bvt z, branchz bvt w, branchw nop

Branches in delay slots are supposed to be illegal but this turns out to have predictable behavior. And games use it. Not sure if ps2 games are as evil, but they probably are...

-[Unknown]

jpd002 commented 9 years ago

I haven't seen anything of the sort on the PS2 games I've tested yet, but there are many other tricky things, especially on the VU microprogram pipeline side, that will be really fun to test.

unknownbrackets commented 9 years ago

I've added some more tests: https://github.com/unknownbrackets/ps2autotests/tree/master/tests/cpu/ee

Fairly sure the lw, ld, etc. instructions are also reading into $0 currently in Play.

-[Unknown]

jpd002 commented 9 years ago

Great! I'll take a look at those new tests. And you're right about Play! not handling loads to R0 properly, I'm going to fix that.

ADormant commented 9 years ago

Tests like these ones which can be run by every user would be great. http://psx.amidog.se/doku.php

unknownbrackets commented 9 years ago

I don't like demos for this purpose. Not only are they more complicated to write (creating user interaction, displaying and organizing results on the screen), but they are less automate-able. They also introduce more variables that can cause the tests to fail incorrectly.

The goal is to be able to quickly run each and every commit in the emulator against a set of tests to know that you didn't break anything. Not so that users can manually check every once in a while to see if the test started failing sometime in the past couple weeks of development.

-[Unknown]

i30817 commented 9 years ago

What about hooking up the test suite to pcsx2 output too? It would be nice, and probably useful if you could share test writing effort and see where things differed. Not necessarily where you share the server farm doing them, but where the code is the same.

IMO, every emulated platform should have a shared hardware automatic test suite for all non-toy emulators... The dolphin infrastructure for this is amazing, even has graphical diffs.

unknownbrackets commented 9 years ago

Totally agree. Well, the psp tests are also used by multiple emulators, and were originally written by soywiz because he's written like 3 psp emulators (in different languages) now and was tired of the same bugs, heh.

Since these are just elfs they should run just as well on any emulator, just need the hookups to stdout.

Agree that the fifo tests for Dolphin are pretty nice, yeah.

-[Unknown]

jpd002 commented 9 years ago

I've just added a new "AutoTest" project that is basically an headless version of the emulator which takes a single parameter as input to specify the path of the tests (ie.: ps2autotests/tests). It will go through all the ELFs it finds and write stdout to a file besides the ".expected" files.

It's very rough and there's some technical issues with it which still prevents it from being used automatically. For example, there's no way to recover if a test program makes the emulator crash (currently the case for branch.elf because of an implemented instruction) or if the test program never finishes its execution because of an issue with CPU emulation (currently the case for branchdelay.elf).

Also, I've disabled building that executable for now because it basically has to build the whole emulator separately and I'm not sure it's really a good idea to build twice (once for emulator and once for test executable) when doing development. It can be enabled manually for now, but I'll have to think of something better.

I'm going to fix the various issues that were arisen by the new tests you've added.

unknownbrackets commented 9 years ago

What we did in ppsspp was create a "Core.lib" that is basically the entire emulator (well, we have a few others, like GPU, etc..) This is then linked into the UI and headless versions of the emulator (so it's only linked twice, not compiled twice.) This does mean having a dedicated vcxproj for the non-UI parts.

Note that this also allows MSVC to build more efficiently on multiple cores. We don't use (or appear to benefit much from) LTCG, though, which of course would add a lot to the link time.

I've added some iop tests. Right now these are just .irx files, since that's convenient to run in ps2link (seems to be slightly less stable, though.) Not sure if it'd be better to set up a stub elf that runs the irx.

The iop tests highlight some of the load delay slot issues. Tried to copy over the branchdelay test, but it just makes the test barf up weird results, which makes sense since it's undefined behavior. I've heard that ps1 games did use branch delay slots in invalid ways though, but it's probably better to create tests based on observed results.

Do you think it's worth having separate tests for vu0 and vu1? Aside from what only vu1 can do (EFU, XGKICK, etc.) and what only vu0 can do (read/write vu1 regs, have its regs mucked with by cop2), I suppose they should be identical in behavior. I'm thinking it's not worth separate tests...

I also wonder if the cop2/gte is present under the iop... haven't tried it yet.

-[Unknown]

jpd002 commented 9 years ago

Yeah, I was thinking about creating a static library, which would address that issue elegantly, but I still gotta think about it...

I believe it would be simpler to just have the IRX files in there instead of a stub ELF that loads IRXs: No special synchronization between the EE and IOP would be required and no mounting a drive on 'host' to make the IRX available. But I guess we will have to test IOP module loading from the EE at some point though.

AFAIK, VU0 and VU1 instructions behave the same way, so, we could make a single test set for general VU behavior and specialized tests for the things you've mentioned.

I've been also wondering about the GTE. It's probably there, but my guess would be that it's disabled by the IOP BIOS and unusable by IOP modules.

unknownbrackets commented 9 years ago

Makes sense. I still haven't gotten my head around all this sif and vif business.

Got the iop branch delay slot test working after looking at it again, results seem consistent there too (although different form the r5900 and r4000.)

mfc2 seems to not write anything to the reg, so either I'm testing wrong or it seems like cop2 isn't usable.

-[Unknown]

ADormant commented 9 years ago

I wondered about this myself. I know PS2 has R3000A(IOP), PS1 SPU and RAM but does it have MDEC and GTE in hardware or are they emulated in software? And what about PS1 GPU?

fuel-pcbox commented 9 years ago

@ADormant I'd say hardware. The PS2 simply isn't powerful enough to emulate even portions of the PS1.

On Fri, Apr 10, 2015 at 8:57 AM, ADormant notifications@github.com wrote:

I wondered about this myself. I know PS2 has R3000A, PS1 SPU and RAM but does it have MDEC and GTE in hardware or are they emulated in software?

— Reply to this email directly or view it on GitHub https://github.com/jpd002/Play-/issues/36#issuecomment-91566309.

ADormant commented 9 years ago

Even the PS1's GPU? So the question is, if that PS1 hardware is used only for the backwards compatibility or also for PS2 games?

jpd002 commented 9 years ago

@unknownbrackets I took a look at your first FPU test and I was thinking that it might be interesting to print out the actual hexadecimal value of the floating point numbers. The PS2 isn't compliant with IEEE-754. For example, I'm sure there's some cases where adding two numbers printed out as NaNs by printf might give a meaningful result and not just another NaN.

unknownbrackets commented 9 years ago

Right, I know that the NANs there are not actually NANs (the fpu doesn't even support NANs afaiu.) I was mainly just thinking that bit exactness isn't what that particular test is going for - rather, I was trying to trigger the fcr bits. Still not sure how to get underflow to trigger.

For the actual tests of the instructions themselves, yes, I think hex will make the most sense.

But maybe it would be better to change it there... I just want to make it easier to pass on its own.

-[Unknown]

unknownbrackets commented 9 years ago

Let me know if the representations I've added for arithmetic/div/mul stuff so far work for you. If there are other test values you want I can run them, just tried to throw some representative ones in there.

-[Unknown]

jpd002 commented 9 years ago

Yes, the hexadecimal values are great! Another thing that would require testing is rounding during FPU/VU operations (there might be a 1 bit difference from IEEE-754 on the PS2). I'll try to get some test cases for that.

unknownbrackets commented 9 years ago

Right. I'm not sure if it's better to have a separate test for that or not.

Since the PSP is mostly IEEE, we only have a multiply test using 0.2965576648712158203125f * 62.0f (which mattered because previously we were only using the rounding mode for cvt.w.s, not other arithmetic, and this broke Metal Gear Solid Peace Walker and Gods Eater Burst.) We also have a flush-to-zero test, but I don't think that's relevant with the PS2.

-[Unknown]

unknownbrackets commented 9 years ago

Well, perhaps we can mark this "resolved" since there are (some) tests now and Play! is capable of running at least most of them.

FWIW, there are tests now for the simd features, so now only missing cop0, a couple trap/exception related things, rounding for the fpu, and then other tests - things like the ipu, vif, vus, cop2, etc. Lots to test, really.

-[Unknown]

jpd002 commented 9 years ago

I've ordered a PS2 Network Adapter so I can contribute to some tests myself. I'll probably start off by writing some VU tests to see how the flags work in coprocessor mode.

unknownbrackets commented 9 years ago

Cool. Yeah, it's pretty much a pain without using the network, shuffling usb sticks back and forth or etc.

-[Unknown]

unknownbrackets commented 9 years ago

I've added an initial test with the macro integer instructions: https://github.com/unknownbrackets/ps2autotests/commit/2d79a5317195d618f1a511fd6aee0a5d0b781fc2

Added some shared code to print the flags.

-[Unknown]

unknownbrackets commented 9 years ago

I was thinking, it's probably best to separate out two tests for most of the instruction types: one with flags and one not. That way it's more straightforward to see what's passing.

Some of the flags are going to be hard to pass exactly correctly. It'll be more important to get the calculations correct.

There's also going to be some fun with the pipeline. "Q delay slots" in a way, and stuff.

-[Unknown]

jpd002 commented 9 years ago

Yeah, I think it's a good idea to have tests with and without flags.

Q and P pipelines are going to be tricky, but I think XGKICK is also going to be hard to test. You can basically have two XGKICKs one after another. I could try to pull the exact case I've seen, but I know VP2 does that.

I've also heard things about VU integer registers having potentially weird behaviors around branches, but that's just something I've read a long time ago, so I'm not sure it's really there.

unknownbrackets commented 9 years ago

Hmm. Doesn't XGKICK just stall further instructions until the previous XGKICK completes?

Timing can be crazy hard to test, but as long as it interlocks properly I guess we just have to understand the surrounding behaviors (e.g. as it affects Q/P/flags/etc.)

Hmm, yeah I've also read something like that, but I'm not clear on the case. I do know that flags can be tricky, and so if branches are done based on flags (like Z/S) it might have more to do with that.

-[Unknown]

jpd002 commented 9 years ago

Ok, I had the wrong idea in mind, here's what I saw:

00000B48 000002FF 100C6001 NOP IADDIU VI12, VI12, $0001 00000B50 000002FF 50070002 NOP IBEQ VI7, VI0, $00000B68 00000B58 000002FF 8000033C NOP NOP
00000B60 000002FF 80003EFC NOP XGKICK VI7 00000B68 000002FF 40000077 NOP B $00000F28 00000B70 000002FF 800046FC NOP XGKICK VI8

So, yeah, not as bad as I thought, but still pretty gnarly.

Here's the page that describes the VU integer branch behavior (formatting sucks, sorry): http://wiki.pcsx2.net/index.php/PCSX2_Documentation/PS2_VU_%28Vector_Unit%29_Documentation_Part_1 Not quite sure about the accuracy of that though since I didn't need to do anything special to make Champion Return to Arms run its VU code.

I'm working on getting the results from the muldiv test right at the moment, still not sure about how to handle the signed division of a negative number by 0. I'll probably add some more test cases in there once I get my PS2 network adapter.

TheLastRar commented 9 years ago

FYI, the documentation on PCSX2's wiki is often taken from forum posts or dev blog posts, which in this case are better formatted http://pcsx2.net/developer-blog/208-ps2-vu-vector-unit-documentation-part-1.html

jpd002 commented 9 years ago

@TheLastRar Thanks for the updated link, it's way easier to read :)

@unknownbrackets Just found a bug in the way VU subroutine calls are handled, the block cache isn't properly reset when a new microprogram is written in VU0 memory. In the VU integer test cases, it just keeps executing the first microprogram. Going to fix that tomorrow.

unknownbrackets commented 9 years ago

Ah, the emitter currently doesn't allow for testing VI16-31. I guess I could add that just to ensure they're properly handled. Will definitely make sense to test delay slot stuff there too.

MAX/MIN with denormals makes total sense as there's no nan/denormal support afaiu. That's probably worth testing better in the fpu tests too. In fact, reading that I bet the PSP's denormals probably work that way too and we don't handle it right in PPSSPP...

That's a good point about that invalidation issue. I've been messing with vif0 testing, will probably want to make sure MPG invalidates as well (I suppose it's likely a more commonly used method of code upload?)

-[Unknown]

jpd002 commented 9 years ago

Ok, fixed the invalidation issue, the VU integer test works properly now. Yeah, most games use the VIF to upload their micro programs and Play!'s MPG properly invalidates any cached blocks.

Just got my PS2 network adapter, I should be able to add some tests very soon.

unknownbrackets commented 9 years ago

Cool. I've started some dma tests but I'm running low on free time lately.

By the way, you can recompile the tests with 1 instead of 0 for the block constructor in order to test them on vu1.

-[Unknown]