osresearch / spispy

An open source SPI flash emulator and monitor
339 stars 41 forks source link

SRAM instead ? #36

Open fenugrec opened 1 year ago

fenugrec commented 1 year ago

Cool project ! Looks quiet, too bad...

Had you considered fast parallel SRAM instead ? No RAS/CAS nonsense, 10/15ns grades should be readily available...

I haven't looked at your HDL, do you take advantage of a 16 or 32-bit wide RAM read to start the external access just before receiving the last address bits (MSB-first seems pretty typical) ?

nic3-14159 commented 1 year ago

I'm not really involved in the development of this project but I can probably speak on these a bit

Had you considered fast parallel SRAM instead ? No RAS/CAS nonsense, 10/15ns grades should be readily available...

I think the main issue with parallel SRAM compared to DRAM is that it's a lot more expensive to match the SPI flash densities typically used nowadays (16MiB/128Mbit or more). Glancing through Digikey and Mouser, a 256Mbit SDRAM can be had for around $5, which seems to be around the same price range as a 16Mbit, maybe 32Mbit SRAM. The faster speed grades also seem to have smaller capacities or are very expensive for larger capacities. I didn't thoroughly search so maybe there's cheaper options.

One option I did see suggested was using pseudo-static RAM (PSRAM), which internally uses DRAM but presents an SRAM like interface externally. Many of them also use the SPI interface, and use some of the same commands used by SPI flash chips. In those cases, you can pretty much just connect the PSRAM directly to the SPI bus and it will respond to SPI flash read commands in the same way as actual flash would, and you can just multiplex it with the FPGA which would handle commands PSRAM doesn't implement. They are also quite cheap, about $2 for a 64Mbit SPI PSRAM from AP Memory. As it is based on DRAM it does need to be allowed to refresh once in a while, but like the currenty spispy design refreshes could likely be inhibited during reads.

I haven't looked at your HDL, do you take advantage of a 16 or 32-bit wide RAM read to start the external access just before receiving the last address bits (MSB-first seems pretty typical) ?

From pictures of the ULX3S FPGA board used for the spispy, it has a Micron MT48LC16M16A2 SDRAM chip which has a 16 bit wide bus. Based on the following quote from https://trmm.net/Spispy/:

... so we can do things like initiate the row/bank activation when the emulator has received 14 bits of the address, overlapping tRCD with reception of the next 9 bits, which we send to the column address and overlap the CAS latency with receipt of the last address bit, which we use to select the upper or lower byte of the 16-bit read result.

it does seem like the spispy does what you described.

fenugrec commented 1 year ago

densities typically used nowadays (16MiB/128Mbit or more)

Yes, my own use case is 16 Mbit , and I didn't realize a typical PC boot flash (which seems to be the main motivation for this project) is more of the order of 128 Mbit ! As you said, price rapidly gets crazy for large fast SRAMs, and so does power consumption.

Not familiar with PSRAM, that would be a nice shortcut if it has a suitably-compatible SPI command set.

it does seem like the spispy does what you described.

Good find. I would've been surprised if he hadn't taken advantage of this already.

Anyway, it doesn't look like this project is going to evolve much unless someone adopts it...

osresearch commented 1 year ago

I started building a PSRAM version since it seemed like an easy way to have mostly command-compatible chips, but ran into some significant problems with the need to mux multiple PSRAM chips - the pin-to-pin latency of the FPGA was too high to keep up with the SPI clock if the data from multiple chips had to transit the FPGA fabric before being sent to the host..

As far as I can find, the largest PSRAM with the "SPI" compatible interface is 8 MiB (as used on the esp32). The larger ones from IS are using "Octal SPI" which has nothing in common with the normal command set, and also has highly variable latency on some commands. If there were 16 MiB chips that didn't require muxing, it might be doable, or for applications that only need 8 MiB of data.

I haven't been doing as much firmware work these days, so the project hasn't had much development recently. It is looking for new developers / maintainers who want to extend it, so feel free to send PRs!

osresearch commented 1 year ago

I forgot to ask -- what is your use case for the smaller memory? If it fits into the 8 MiB chips and doesn't run too fast, there might be a way to fix up my ice40 branch to work with it using the available PSRAM chips. This would be a very low-cost design compared to the ecp5 version.

fenugrec commented 1 year ago

Good to hear from you @osresearch !

I started building a PSRAM version since it seemed like an easy way to have mostly command-compatible chips, but ran into some significant problems [...]

Ah, interesting. Won't waste time looking at that option then.

I haven't been doing as much firmware work these days, so the project hasn't had much development recently.

That's what I figured. And that open ticket about the SPI statemachine being "a mess" tells me maybe you reached a point where a big rework was wanted / needed and it can be hard to find the motivation to do so ! Have the same problem in some of my own projects...

what is your use case for the smaller memory?

I'm at the "thinking out loud" stage of planning a side-channel attack on a certain mcu that reads an encrypted firmware from an 8Mbit spi flash (oops, I said 16 earlier, but it's a MX25L8006). I'm not sure yet if I even need to control the entire address space - maybe I can get away with just a Page Program to vary up to 256 bytes at a time. This depends if the bootloader does the checksum validation before or after the decryption. Maybe I could also emulate a SPI flash entirely in a FPGA that would return 0 everywhere except a few key locations - probably wouldn't need tons of blockram and could maybe fit on a small-ish device.

Like I said, very preliminary planning...