simias / psx-guide

Guide to writing a Playstation emulator
250 stars 17 forks source link

Store16 #1

Open arcanis opened 9 years ago

arcanis commented 9 years ago

Hi,

First of all, great work! Your guide is very well written, kudos. I really hope you'll be able to finish it, good emulation guides are quite hard to find, and even more rare are those as clean as yours.

I have a small question about the store32/16/8 instructions:

I start with an empty function instead of copying the store32 code because different devices react differently when we change the transaction width. Some will pad the value to 32bits with zeroes, others may just set 16bits in the register and leave the others untouched. For this reason I’ll be conservative and add them only when needed.

Do you have more informations on this, maybe references? For example, can't we just do something like this (assuming a C code, which doesn't have generics)? Why?

void write_uint16(uint32_t addr, uint16_t value)
{
    uint32_t w_addr = addr & ~0b1;

    if (w_addr == addr) {
        write_uint32(w_addr, (read_uint32(w_addr) & 0xFFFF0000) | (value << 0));
    } else {
        write_uint32(w_addr, (read_uint32(w_addr) & 0x0000FFFF) | (value << 16));
    }
}
simias commented 9 years ago

Thank you for your feedback, don't hesitate to let me know if those parts are unclear or poorly writte, english is not my mother tongue and it's the first time I write something like this.

Your code would work for the RAM and "RAM-like" peripherals. Unfortunately not all devices handle all access "widths" properly. NoCash has a more exhaustive list: http://problemkaputt.de/psx-spx.htm#unpredictablethings

You can see that some devices treat 16 or 8 bit writes like 32bit writes (I assume by setting the MSBs to 0?) so your code wouldn't be accurate for those.

An other case I encountered a few days ago is when reading the timer's registers: the registers are 16bits but if you load them with a LW you end up with the value of the next instruction in the high 16bits. What seems to be happening is that the CPU fetches the next instruction then executes the LW on the timer peripheral. This peripheral sets the low 16bits to the register value but doesn't touch the high bits which still contain the value of the previously fetched instruction, so we end up reading that. Of course it's useless and no game should rely on that value being there but if you want to be completely accurate you have to handle 32bit loads from the timer registers with some special code.

An other thing to consider is that sometimes reading from a register can have a side effect. For instance reading from the timer mode registers clears certain bits: http://problemkaputt.de/psx-spx.htm#timers (see bits 11 and 12 in the Counter Mode register).

In your implementation you read the current value when doing a store16 but doing so will trigger your "register read" code which will clear those bits, and that's not accurate. You would need a special read function for those "fake" reads and that makes your code more complex.

For these reason it's not really possible to write a generic read and write code for all peripherals as far as I can see.

ghost commented 7 years ago

@simias, the behavior you're referring to is called 'open bus', where a peripheral doesn't update some or all of the data bus. This is a side effect of something called 'capacitance' in the wires connecting to the data bus (any wire, really), where the last value put on the wire lingers for a short amount of time before decaying back to 0. Typically in an emulator, this is implemented with a single variable that you pass by reference to a read function:

uint32_t mdr; // memory data register

void read(uint32_t address, uint32_t &data) {
  // ...
  case 0x1f800000: // example address of a register that uses open bus behavior.
    data &= 0xffff0000; // keep the high-order bits
    data |= open_bus_register_16; // update the low-order bits
    break;
  // ...
}

// ...
read(address, mdr);
register[rt] = mdr;

The reason why the data is combined with the value of the next instruction is that the CPU just got done reading the instruction, and the data bus still has the instruction's value due to capacitance. I am curious as to what happens in this situation during an instruction cache hit, would the data bus have the current instruction? 4 instructions back? a partially/fully decayed value? I imagine a long loop that doesn't access memory and uses fully cached instructions would cause the bus to decay to 0, but it's just a guess.

simias commented 7 years ago

I wasn't sure if it was open bus or if it was just a predictable garbage value. In my experience open bus tends to be a little more random than that, here the values seem predictable even across consoles. That being said I'm used to high-Z on input pads, not within the IC itself. That's why I preferred to be vague about it rather than risk saying something inaccurate.

ghost commented 7 years ago

It definitely reeks of open bus. The ARM7TDMI does the same thing, although it only has a 3 stage pipeline. Since the data buses are shared for data/instructions this is something you'd expect to see in a pipelined architecture if a component didn't drive all the bus lines.

PS: Open bus is usually predictable :smile:

simias commented 7 years ago

That does make sense. Since you seem more knowledgeable about those issues don't hesitate to edit the guide if you feel like adding these details.

I'm also thinking about writing a similar guide about PocketStation emulation, it's a lot simpler than the PlayStation and might be a better fit for newcomers to emulation.

Do you have docs about the ARM7TDMI's pipelining? It's one of the things I haven't implemented yet (all my instructions take the same number of cycles to run for now).

ghost commented 7 years ago

I just might do that.

As far as ARM7 documentation goes, the pipeline is pretty simple: fetch, decode, execute, this isn't just a description of how the processor works, these are actually distinct temporal stages. Reading the manual, it also said that THUMB instructions just decode to ARM instructions, and use the exact same hardware; THUMB is merely a translator.

The stages behave exactly as you'd imagine, and there is a register file that is read in the decode stage, written in the execute stage. However when emulating this detail is insignificant. Since the last stage is where I/O occurs, there are no bubbles in the pipeline. (It would be painful to look at since ARM's NOP isn't 0 AFAIR :smile:) The pipeline is only significant for emulation of the ARM7 in regards to timing, and PC calculations. Are you planning to emulate the R3051 pipeline at all?

simias commented 7 years ago

The problem is that emulating the PlayStation CPU very accurately would be pretty slow. Mednafen emulates the "load absorb" feature to some extent, where if you load something to a certain register the CPU can keep running as long as the instructions don't have a dependency on this register. If you wanted to be very accurate you'd also have to implement the write buffer and probably a few other things.

Ryphecha told me that there were a few games which had compatibility issues on Mednafen PSX because of the inaccurate pipeline emulation. Sounds like a tough problem to tackle if you want you emulator to run on commodity hardware.

ghost commented 7 years ago

I don't know how much I believe that. I think it can be done in an optimal enough way that most computers could run it full speed. CEN64 was approaching full speed while emulating a much more powerful system (2 MIPS CPUs with pipeline emulation). Since a PSX emulator only has 1 pipelined CPU the performance hit wouldn't be as big as you might think. Do you have an IRC room? I'd like to pick your brain on a few things, and offer my experience for your PocketStation project.

simias commented 7 years ago

Sure, I hang around on freenode, I just made a #psx channel. My nick is simias, obviously.