litex-hub / linux-on-litex-vexriscv

Linux on LiteX-VexRiscv
BSD 2-Clause "Simplified" License
552 stars 174 forks source link

LiteSDCard -- not working (almost all of the time) [ECP5] #255

Closed essele closed 2 years ago

essele commented 2 years ago

I am really losing my mind on this ... I saw some inconsistent behaviour on an icesugar_pro but couldn't reproduce it, so discounted it. But now I've just seen exactly the same thing on a new ULX3S, but I now also can't reproduce it!

When I first tried the icesugar_pro I had some intermittent behaviour when setup to use sdcard (i.e. litesdcard rather than spisdcard) ... it would work a few times, then completely stop working (SDcard boot failed, sdcard_init would also fail) and I couldn't get back to a working version.

I got a new ULX3S out of the box yesterday and after figuring out the programming mechanism I programmed it (sdcard is the standard config on this device) and connected to the serial device. It was at the "litex>" prompt having failed to boot. I typed "reboot" and it ran through, downloaded the files from the sdcard, and started booting linux. It stopped just after the serial messages, but that was quickly fixed by me updating the device-tree to the proper version. It then booted absolutely perfectly all the way though no issues (other than being a bit slow.)

I then tried building a 2-cpu bitstream to see if that would work on the ulx3s, but it didn't and I saw very similar behaviour to that which I saw on the orange crab (see other issue.)

However, when I went back to the single core bitstream (rebuilt, I didn't keep the original, but used the same args) the sdcard wouldn't work and despite lots of attempts it still isn't working. I hadn't changed (or even removed) the sdcard in the intervening time. This is exactly what I saw on the icesugar_pro. If I switch to spisdcard I'm sure it will work (it does on the icesugar_pro) but the performance is obviously quite a bit worse and I'm trying to get to a position where I can boot these devices quickly.

Is anyone else having any issues? I would have suggested that litesdcard simply doesn't work, but I have several cases on multiple devices where it clearly does, just not for very long!

I know this doesn't make sense at all ... behaviour shouldn't change like this.

essele commented 2 years ago

Hmmm ... more testing shows this is likely to be some kind of timing issue.

If I turn on software debugging for the litesdcard (I've altered the debugging code slightly to make it more readable) then I see different behaviours at different SoC clock frequencies. This is all on the ULS3X which defaults to 50MHz.

Short summary: At a SoC frequency 25Mhz it all seems to work (the sdcard is at 12Mhz). At 50Mhz and 40Mhz it fails, but with different problems. At 50Mhz (but with the SDCard freq set to 12Mhz in software) it also works.

So likely a problem running the sdcard clock above a certain point ... works at 12MHz, fails at 20Mhz and 25Mhz. I will experiment a bit more. This also seems consistent across a couple of different cards.

This is the working 25Mhz output...

litex> sdcard_init

Initialize SDCard... Setting SDCard clk freq to 390 KHz
CMD0: GO_IDLE
waited 62 for cmdevt: 00000001
00370000 09203300 00092010 00000900
CMD8: SEND_EXT_CSD, arg: 0x000001aa
waited 24 for cmdevt: 00000001
20330000 09201000 00090008 000001aa
Setting SDCard clk freq to 12 MHz
CMD55: APP_CMD
waited 2 for cmdevt: 00000001
20100000 09000800 0001aa37 00000120
ACMD41: APP_SEND_OP_COND, arg: 70ff8000
waited 2 for cmdevt: 00000001
00080000 01aa3700 0001203f 00ff8000
CMD55: APP_CMD
waited 2 for cmdevt: 00000001
aa370000 01203f00 ff800037 00000120
ACMD41: APP_SEND_OP_COND, arg: 70ff8000
waited 2 for cmdevt: 00000001
203f00ff 80003700 0001203f 00ff8000
CMD55: APP_CMD
waited 2 for cmdevt: 00000001
00370000 01203f00 ff800037 00000120
ACMD41: APP_SEND_OP_COND, arg: 70ff8000
waited 2 for cmdevt: 00000001
203f00ff 80003700 0001203f 00ff8000
CMD55: APP_CMD
waited 2 for cmdevt: 00000001
00370000 01203f00 ff800037 00000120
ACMD41: APP_SEND_OP_COND, arg: 70ff8000
waited 2 for cmdevt: 00000001
203f00ff 80003700 0001203f 00ff8000
CMD55: APP_CMD
[......]
CMD51: APP_SEND_SCR
waited 4 for cmdevt: 00000001
20060000 09003700 00092033 00000920
dataevt: 00000001
CMD16: SET_BLOCKLEN
waited 2 for cmdevt: 00000001
00370000 09203300 00092010 00000900
Successful.

At the default of 50MHz I see timeouts for anything that is done after the switch to 25MHz...

litex> sdcard_init

Initialize SDCard... Setting SDCard clk freq to 390 KHz
CMD0: GO_IDLE
waited 71 for cmdevt: 00000001
ff1fffff ffff0800 0001aa08 000001aa
CMD8: SEND_EXT_CSD, arg: 0x000001aa
waited 26 for cmdevt: 00000001
ff080000 01aa0800 0001aa08 000001aa
Setting SDCard clk freq to 25 MHz
CMD55: APP_CMD
waited 88341 for cmdevt: 00000005
ff080000 01aa0800 0001aa08 000001aa
ACMD41: APP_SEND_OP_COND, arg: 70ff8000
waited 88341 for cmdevt: 00000005
ff080000 01aa0800 0001aa08 000001aa
CMD2: ALL_SEND_CID
waited 88341 for cmdevt: 00000005
ff080000 01aa0800 0001aa08 000001aa
Failed.

At 40Mhz I see the commands look like they complete ok, but the result values look wrong after the switch to 20Mhz.

litex> sdcard_init

Initialize SDCard... Setting SDCard clk freq to 312 KHz
CMD0: GO_IDLE
waited 86 for cmdevt: 00000001
ff7fffff ffff7fff ffffff7f ffffffff
CMD8: SEND_EXT_CSD, arg: 0x000001aa
waited 32 for cmdevt: 00000001
ff7fffff ffff7fff ffffff08 000001aa
Setting SDCard clk freq to 20 MHz
CMD55: APP_CMD
waited 2 for cmdevt: 00000001
ff7fffff ffff0800 0001aa7f ffffffff
ACMD41: APP_SEND_OP_COND, arg: 70ff8000
waited 2 for cmdevt: 00000001
ff080000 01aa7fff ffffff7f ffffffff
CMD55: APP_CMD
waited 2 for cmdevt: 00000001
aa7fffff ffff7fff ffffff7f ffffffff
ACMD41: APP_SEND_OP_COND, arg: 70ff8000
waited 2 for cmdevt: 00000001
ff7fffff ffff7fff ffffff7f ffffffff
CMD55: APP_CMD
essele commented 2 years ago

I'm now really confused ... I had a nice image built at 25Mhz which worked perfectly (I still have the .bit file and have consistently good results with it), but any new images I try to build now don't work.

My good image works with a no-brand SD card, but not a SanDisk one. At one point I had an image that worked with the SanDisk one but not the other one. And now neither of them work.

This just feels really flakey.

Is anyone else having any of these problems? Do I just have four iffy cards from three different manufacturers??

EDIT: I've just done a clean build after moving the build directory out of the way, same command line as above, nothing else changed at all, and now I can read the SanDisk and the other card. Coincidence? I probably ran five or six builds before that didn't work and this one now does ... seems a little unlikely (all of them met timing) ... so is something left in the build area that influences future builds incorrectly?

enjoy-digital commented 2 years ago

Hi @essele,

I'll have a look. In the meantime, it seems @gregdavill fixed an issue in the clock generation in https://github.com/gregdavill/litesdcard/commit/12c35d9cab7fac02736e57489028d1e7ef00d5be that improves behaviour on ButterStick. Could you do a test with this?

gregdavill commented 2 years ago

I want to do a bit more investigation with that fix before I open a PR. Because by default it's working on some platforms?

essele commented 2 years ago

It doesn't appear to make any difference either way I'm afraid (I've just tested on a ulx3s/85K)

strumtrar commented 2 years ago

I just tried the current master on an ECPIX-5 and haven't managed to get the SDCard running. I didn't try different speeds yet or different SD cards, but I can confirm that it doesn't work with the Sandisk card, that I currently use.

strumtrar commented 2 years ago

With the patch from @gregdavill it works successfully on my ecpix5 board.

enjoy-digital commented 2 years ago

@strumtrar: I've been doing some tests with @gregdavill's changes and it also fixed SDCard at 50MHz on the ECPIX-5 (previous code was working at 75MHz). I also did some tests on other boards (Xilinx, Efinix) and haven't seen a regression so I merged the changes. This should already improve the situation and we could have a closer look with an analyzer to better understand.

@essele: Not sure it will fix thing for the ULX3S, but I'll try to have a closer look soon.

gregdavill commented 2 years ago

Thanks @enjoy-digital, I hadn't had a chance to setup a test to actually compare the code changes.

The SD card spec has 2 different timing models one "low speed/init" and then a different one with significantly different clock->edge timings for "High speed" mode. My understanding is that inverting the clock in this case puts the rising edge in a better position for capturing the data both on the SD side and the FPGA side.

enjoy-digital commented 2 years ago

@gregdavill: Thanks, the improvements we can see on the different boards (ButterStick, ECPIX-5, etc...) indeed indicate that inverting the clock provides a better sampling position. I'll stiil want to test this on some boards and will try to observe the SDCard signals and could share the captures here.

enjoy-digital commented 2 years ago

@essele; I just did a test on a ULX3S V1.7 45F and got the SDCard working correctly. Can you do a test with upstream LiiteX? I assume you have a V2. 0 that I don't have, but I can probably find someone to do a test for me on a V2.0 if this is not working on yours.

enjoy-digital commented 2 years ago

@essele: The issue on the ULX3S was related to some changes in the pinout between the revisions of the boards. I have a 1.7 (Early prototype) and there are some difference on the SDCard pins with 2.0. This has been fixed @goran-mahovlic who reviewed and fixed the pinout for the 2.0 in https://github.com/litex-hub/litex-boards/pull/373 and also tested it on hardware.

I think we can now close this issue since the two initial issues should now be fixed: