Memory mapping section - Githubissues

yupferris commented 8 years ago

First of all, this is awesome; fantastic effort, and after a (somewhat quick) scan this is just lovely :) Awesome stuff!!

In the memory mapping section, you mention you're not entirely sure how memory mapping works at the hardware level, and that it might be that each peripheral will shut itself off when addressed out of its mapped range. I wanted to provide some insight into how this is usually done; hopefully you find it as interesting as I do and I hope it helps :)

Typically, many of the peripherals in these systems are going to be off-the-shelf components (RAM, IO controllers, etc), and adding extra hardware inside them for memory mapping would require additional chips to be fab'ed specifically for this application, which would be prohibitively expensive. Instead, the responsibility of memory mapping should put in the system designer's hands, so that each component is only responsible for one thing, and can thus can potentially be used in a broader scope of applications.

Most peripherals from this era were still communicating with parallel interfaces. This (as opposed to serial) makes them a bit simpler to use and allows them to transfer more data per clock tick, at the expense of requiring more wires between the components. For example, a 4 kilobyte ROM might have:

12 address pins
8 data pins
1 enable pin
VCC/ground etc

Now, in terms of memory mapping, these systems would have employed additional hardware for this task. This hardware would sit between the address/data pins on the CPU and the address/data/enable/etc pins for the peripherals, and, depending on the address the CPU would place on its address pins, route the address lines to the peripherals and enable/disable them. This mostly just comes down to various simple gates/comparators that directly drive the peripherals' pins from the CPU's address pins, but the key here is it would have had to have been very simple and fast, as the CPU's timing constraints would have been pretty strict. This is one of the reasons dedicated hardware was employed - another being the sheer number of pins required and board space it would have reduced.

Now, I'm not particularly familiar with the PS1 internals, but after checking up on the wikipedia page it looks like the "System Control Coprocessor (Cop0)" is the thing gluing all this together. Thinking of it this way, we can also see that it makes sense that it would also handle interrupts, breakpoints, etc. We can think of this bit of glue logic as something similar to the north/south bridge chips in modern motherboards - it's really just an I/O breakout to offload logistics from the CPU and tie everything together without taking up all your board space.

Other examples of such chips would be the C64's PLA, and I believe the two 74139's in the NES.

Anyways, hope you find this little rant useful :) I'm pretty inspired by this writeup you're doing, and I've wanted to do an N64 emulator in Rust for a while now - perhaps I'll follow suit and do that in this style as well!

simias commented 8 years ago

Thank you for the feedback and details!

I think it would be interesting to really tie the guide to the hardware but I don't have all the informations and it would be quite some work. A simplified schematic of the console could be pretty handy I suppose.

I think it makes sense for the Cop0 to handle memory access since that's where the memory translation would take place if it were implemented in the PSX. I think the PSX architecture is relatively simple given that the RAM can only be accessed by the CPU exclusively, there's no arbitration going on.

There's an error in my guide though that I've fixed in my emulator: when the CPU runs a "store byte" for instance it still puts the entire 32bit register value on the bus and I suppose there are some signaling wires on the side to tell the target that it's an 8bit transfer. However some targets ignore this signaling and always use the full 32bits.

For instance if you have 0x12345678 in r1 and you "store byte" it into a DMA register the DMA will use the whole 32bit value (as if you did a "store word"), not just 0x78.

Those kinds of tricky details are the reason I put this guide on the backburner lately, I'm waiting for my emulator to reach a playable state and then I'll go back to write a more "definitive" version of it. Otherwise I'll just spend all of my time rewriting entire sections because it turned out I needed to do it differently...

yupferris commented 8 years ago

Thank you for the feedback and details!

No problem :) I love sharing this kind of knowledge. Hardware is super interesting, especially in these older systems!

I think it would be interesting to really tie the guide to the hardware but I don't have all the informations and it would be quite some work. A simplified schematic of the console could be pretty handy I suppose.

Absolutely. Going into even this much detail in a writeup like this probably isn't appropriate - I really just wrote this little blurb as a response to seeing that you weren't sure yourself, and thought I could shed some light on it; whether or not any of that info trickles down into the writeup itself doesn't matter to me at all :)

I think it makes sense for the Cop0 to handle memory access since that's where the memory translation would take place if it were implemented in the PSX. I think the PSX architecture is relatively simple given that the RAM can only be accessed by the CPU exclusively, there's no arbitration going on.

Totally!

There's an error in my guide though that I've fixed in my emulator: when the CPU runs a "store byte" for instance it still puts the entire 32bit register value on the bus and I suppose there are some signaling wires on the side to tell the target that it's an 8bit transfer. However some targets ignore this signaling and always use the full 32bits.

For instance if you have 0x12345678 in r1 and you "store byte" it into a DMA register the DMA will use the whole 32bit value (as if you did a "store word"), not just 0x78.

Yeah, this kind of thing is quite common in these systems. N64 and Virtual Boy have similar "problems." It definitely complicates things for developers and emu authors, though it's much easier for the hw/system designers :D

Those kinds of tricky details are the reason I put this guide on the backburner lately, I'm waiting for my emulator to reach a playable state and then I'll go back to write a more "definitive" version of it. Otherwise I'll just spend all of my time rewriting entire sections because it turned out I needed to do it differently...

Quite understandable. Again, thanks for doing this project, it's awesome; both the emulator and the writeup!

P.S. I hope you always keep the "broken" playstation logo around for this project:

It totally sums up the life of an emulator author - it's almost heartwarming to see each time I look at it :D

ghost commented 8 years ago

A little late to the party, but after reading this thread I want to add my own two cents. First of all, @simias you are doing the lord's work with this guide. I have been implementing a PSX for over a year off and on, and this guide is going to be essential in getting my implementation to boot games.

You speaking of the bus access width's being more or less symbolic reminds me of how the ARM architecture handles this, and implementing this correctly is required for GBA as well. Games will write a "byte" to the VRAM/CGRAM but those components don't support byte accesses, instead they sample the entire 16-bit half word that their data buses allow. Visual corruption occurs if this isn't emulated properly. I am glad actually to hear that the PSX works the same way, as it makes the implementation of a write function much more straight-forward.

On the ARM, there are 2 pins called MAS (presumably memory access size) that tell the outside world what kind of access is happening:

00 - 8-bit
01 - 16-bit
10 - 32-bit
11 - reserved

So in an emulator, you just implement the full 32-bit load/store operations and pass along a variable noting the width, and let devices respond however they want, just like in hardware. This variable can also be a template parameter so the compiler can create optimal versions of each memory access size.

Loads are a different story, and if MIPS is anything like ARM, it's implemented by sampling the full data bus then using a barrel shifter to align the data correctly for storage in a register.

template<bus_width_t width>
uint32_t load(uint32_t address) {
  // call the real load function
  uint32_t data = cop0::load<width>(address);
  uint32_t shift = 0;
  uint32_t mask = 0;

  switch (width) {
  case BUS_WIDTH_BYTE:
    shift = (address & 3) * 8;
    mask = 0xff;
    break;

  case BUS_WIDTH_HALF:
    shift = (address & 2) * 8;
    mask = 0xffff;
    break;

  case BUS_WIDTH_WORD:
    shift = 0;
    mask = 0xffffffff;
    break;
  }

  return (data >> shift) & mask;
}

With this design, the components have to respond in a non-intuitive way. For example, reading a byte from a theoretical I/O register $8000_0003, the component puts that byte in the most significant data lines, so the CPU can align it properly. I'm not sure the reads are handled this way, but it is fairly common in systems with variable-width buses. Perhaps @yupferris has a better idea on this front?

simias commented 8 years ago

Up until now I assumed that the shifting was done on the device side but now that you mention it I really have no idea. I guess the only way to know that for sure would be to figure out which bus the PlayStation uses (I don't think I ever encountered this bit of info anywhere). I think your hypothesis is probably very close to the truth but I can't really think of a way to confirm it.

If you want an other brain teaser I've been working on PocketStation emulation lately, it uses an ARM CPU (ARM7TDMI, the same as the GBA I think). For some reason the device doesn't support 8bit reads from the FLASH or kernel ROM. 16 and 32 work just fine. I can't really imagine why this restriction exists, it seems like a rather ridiculous limitation.

ghost commented 8 years ago

It would depend on the implementation. In the GBA specifically, the cartridge data bus is only 16-bits wide so 32-bit reads (The width of a standard ARM mode instruction, which is why the 16-bit THUMB ISA is used for cartridge access normally) are implemented as 2 sequential 16-bit reads (with timing also doubled+overhead), and the address bus is trimmed down as well, requiring a weird sequence of latching every 128 (64?) KiB. The only reason they did it that way? Cost. Remember that logic isn't the only deciding factor in hardware implementations, things like cost are also a factor, especially when you consider these components are designed for mass production.

Perhaps the FLASH/Kernel chips selected were cheaper but didn't support 8-bit access directly, and it wasn't enough of an inhibiting factor to select a more costly chip that did. The 16/32-bit selection could be done with 1 pin, however adding support for 8-bit would increase costs by another pin. If the chips are only used for fixed-width instructions and data, this is a fine corner to cut.

You'll often come across multiplexed address/data buses in chips to reduce the pin counts. The NES PPU for example has this, and each VRAM read requires 2 cycles, and an external address latch. It's a sad fact, but makes emulation only slightly more challenging.

simias commented 8 years ago

I can understand certain modules only responding to 8 or 16bit access, but I can't understand why they would handle 16 and 32bit but not 8bit. Seems like if you've implemented 16bit access you should basically get 8bit "for free".

ghost commented 8 years ago

They may have explicitly not allowed it for some reason, as 8-bit access is almost always an essential baseline.

simias / psx-guide

Memory mapping section #3