mikaelpatel / Cosa

An Object-Oriented Platform for Arduino/AVR
https://mikaelpatel.github.io/Cosa/
GNU Lesser General Public License v2.1
339 stars 76 forks source link

SD card driver broken #233

Closed SlashDevin closed 9 years ago

SlashDevin commented 9 years ago

Using the following:

Limor Freid's CardInfo.ino (SparkFun?) works, as does fat16lib's SDFat->examples->SdInfo.ino and FAT16->examples->fat16info.ino.

CosaSD.ino fails at read(cid). It fails at the await in SD::read (line 146 [1]) and leaves the SD card in a weird state. When run again (reset pressed), SD::begin fails at SPI::begin. The SD card must be removed from the reader and reinserted. CosaSD.ino was modified to not use ethernet nor ST7735, and to use Board::D53 for the CS pin.

CosaFAT16.ino tries partition 1 in FAT16::begin(&sd) but fails at the device->read called from FAT16::cacheRawBlock(volumeStartBlock). Then partition 0 is tried, and cacheRawBlock passes (cache not dirty, so it doesn't try to read?), but then it fails because bpb->bytesPerSector == 0.

Also, bpb->sectorsPerFat16 == 0 implies that this card may be formatted as FAT32 card, but other bpb members are also reported as zero. The other test programs report that it is formatted as FAT16. CosaFAT16.ino was modified to not use ethernet and to use Board::D53 for the CS pin.

Here is a 'scope display for CardInfo.ino that works, with MOSI and SCK on the top two probe channels, and logic analyzer on the bottom. (The "Interp" bytes line is not always correct) This is in SD::begin, just after the spi.begin and after the 10 bytes of spi.transfer(0xFF):

CardInfo.ino

Here is the corresponding display for CosaSD.ino:

CosaSD.ino

They both appear to be running at CLK/128 (125KHz). There's a little noise, but it's the best I can do until things get soldered down. There are 4 MOSI pulses starting at 6.12mS that are very short, compared to the working programs. More like spikes, really. Perhaps related to set_clock?

I should also mention that I deleted all Cosa files and subdirectories related to LCDs, networking (INET, Wireless, Socket et al), as well as Codec and Cipher to decrease build times.

I tried to compare Cosa SD.cpp with SDFat Sd2Card.cpp, but I'm not sure what differences are due to recent mods to SDFat (now hosted on GitHub, no history) vs. what you had to do to merge it into Cosa. Arrrr, the curse of merging...


[1] Hmmmm... shouldn't this if (send(command, arg)) goto error; be if (!send(command, arg)) goto error; ? I tried this, but then it fails on the next line.

SlashDevin commented 9 years ago

Removing the set_clock from SD.cpp does not affect the pulses at 6.12mS.

mikaelpatel commented 9 years ago

I cannot get this error to occur with Arduino Mega 2560 and Ethernet Shield with SD reader. I have rerun the CosaSD and CosaFAT16 example sketches without any problems. It is some time since I worked on these as the focus has been the Ethernet protocol support but even there the SD for the Web server works just fine.

Have you tested an unmodified version with the same setting as an Ethernet Shield? Did you try the lower frequency in the CosaFAT16.in?

Cheers!

SlashDevin commented 9 years ago

Unmodified CosaSD.ino does not work. Although I don't have an ETHERNET_SHIELD, I just hooked CS to D4. The 'scope waveforms look the same. The "spikes" may just be an artifact of a quicker driver (i.e. Cosa core faster than Arduino core).

I've got the 'scope interpreter working better, and the byte streams are different. Here's the transcript, with > meaning what's sent to the SD card, and < is the reply from the SD card:

SDFat SDInfo.ino

40 00 00 00 00 95 < 01 48 00 00 01 AA 87 < 01 00 00 01 AA { repeat 77 00 00 00 00 87 < 01 69 40 00 00 00 87 < 01 }

SD CosaSD.ino

40 00 00 00 00 95 < 01 7B 00 00 00 01 83 < 01 48 00 00 01 AA 87 < 01 00 00 01 AA 77 40 00 00 00 F7 < 01 69 40 00 00 00 77 < 01 7A 00 00 00 00 FD < 05

I think the last reply may be the problem, but I'll have to know more about the protocol. I'll keep looking.

As far as frequency, the SD constructor defaults to DIV128_CLOCK, so it's already the slowest it can get. The 'scope confirms that rate... hard to see in the thumbnails, but you can click them for a full-size version.

mikaelpatel commented 9 years ago

Thanks for the update and extra info. You could try to disable the SPI interleaving. There is a define in SPI.cpp (USE_SPI_PREFETCH) that you may comment out. The same goes for SD.cpp. This will give a slower version. It might be something with the clock period even though it is the lowest during the startup handshake. It is increased at the end of the SD.begin().

SlashDevin commented 9 years ago

Sorry, no change. The spikes are about 1uS wide vs. much less than 1uS.

Also fixed one missing zero byte after "69 40" above. [oops, found a second missing zero byte. All the commands sent should have five six bytes.]

Thanks for your help.

SlashDevin commented 9 years ago

Just an update...

Although it's possible that my particular SD card is "picky", I have some hope because current versions of SDFat and FAT16 do work with my card. I am looking at the differences in what is actually sent to the card. Here are what I've captured so far:

SDFat/SDInfo.ino

Command Dir Bytes
GO_IDLE_STATE > 40 00 00 00 00 95
< 01
SEND_IF_COND > 48 00 00 01 AA 87
< 01 00 00 01 AA
APP > 77 00 00 00 00 87
< 01
SD_SEND_OP_COND > 69 40 00 00 00 87
< 01
APP > 77 00 00 00 00 87
< 01
SD_SEND_OP_COND > 69 40 00 00 00 87
< 01
APP > 77 00 00 00 00 87
< 01
SD_SEND_OP_COND > 69 40 00 00 00 87
< 00 <-- Note! Waited for state change.
READ_OCR > 7A 00 00 00 00 87
< 00 80 FF 80 00 <-- Note! Response different from below
clock rate changes...
SEND_CID > 4A 00 00 00 00 87 Note: Short CS toggle observed on MSB of 87?
< 00
> (idle 0xFF)
< FE 03 53 44 53 44 30 32 47 80 50 B5 7D 53 00 6C 15 6E BE

SD/CosaSD.ino

Command Dir Bytes
GO_IDLE_STATE > 40 00 00 00 00 95
< 01
CRC_ON_OFF > 7B 00 00 00 01 83
< 01
SEND_IF_COND > 48 00 00 01 AA 87
< 01 00 00 01 AA
APP_CMD > 77 40 00 00 00 F7
< 01
SD_SEND_OP_COND > 69 40 00 00 00 77
< 01
READ_OCR > 7A 00 00 00 00 FD
< 05 FF FF FF FF
...some time...
SEND_CID > 4A 00 00 00 00 1B
< 01

I have not started looking at the command args sent nor most of the replies. I'm currently investigating why the SEND_OP_COND waits for the card to go to the INIT_STATE. I wonder if current SDFat is more sensitive to the cards' different behaviors by waiting? The READ_OCR command also returns different results immediately after the SEND_OP_COND.

If only the SDFat history were still available, I could investigate why the change was made after you merged that version into Cosa. Sigh.

mikaelpatel commented 9 years ago

I have adjusted the initialization so that it will retry. Please see commit https://github.com/mikaelpatel/Cosa/commit/1c5808195cd2c6be9d052638e40dc7b72f58bdea.

There is also some cleanup of the performance boost sections and a new table driven checksum to further increase performance with approx. 20%.

mikaelpatel commented 9 years ago

Verification of CosaFAT16 https://dl.dropboxusercontent.com/u/993383/Cosa/logic/cosafat16.logicdata. Use Saleae Logic to view https://www.saleae.com/downloads.

mikaelpatel commented 9 years ago

Found a problem with send of ACMD commands. See commit https://github.com/mikaelpatel/Cosa/commit/8f019c301dc09f85dac9f39bb5676821f34b7ad7.

SlashDevin commented 9 years ago

Yes, I had noticed the ACMD arg being sent twice, but it didn't have an effect. My card takes about 5ms from the INIT command to the IDLE bit going low, so the retry loop was the biggest problem. I found some recommendations to deassert/assert CS after each SEND_OP_COND attempt, but it works fine without it.

I still think there is a little confusion between ::send( ms, cmd, arg ) and INIT_TIMEOUT. I think the timeout is supposed to be from the GO_IDLE_STATE to !in_idle_state, not the time it takes to get a reply from the GO_IDLE_STATE command. Oh well, it ain't broke...

Thanks, it seems to be working now.

mikaelpatel commented 9 years ago

Great! Thanks for the testing and debugging. I will have to go back to the spec and see if I can come up with a better timeout handling. It is somewhat of a patch right now. Not very clean. Needs some refactoring.

Did you get your card to run at full speed (DIV2)? I have tweaked the SPI block transfer to give an extra 20% boost. And added a table driven CRC update variant. Runs in parallel with the SPI transfer.

SlashDevin commented 9 years ago

It's not pretty, but CosaFAT16.ino runs at DIV2. Adding a probe cleaned the CS up enough to run. XD

Open file:3 ms
Write file (100 KByte):857 ms
Read file (100 KByte):349 ms
Remove file:15 ms
Open/Write/Close file:37 ms
Nisse badar.
Open/Read/Close file:10 ms
mikaelpatel commented 9 years ago

Thanks for the info! Seems like you have a much faster SD card then I have. Read is approaching 285 KB per second which is really good considering the 1 MB per second upper limit on the AVR SPI bus at DIV2 (8MHz). The 285 KBps includes the double buffering and the CRC. One level of the buffering and part of the CRC update is done in parallel with the SPI transfer. I need to check what type of performance memcpy() has. It should be in the range 2-4 MBps.

Further optimization is to skip both: Write directly to the application level buffer and ignore check-sum. The check-sum handling could be improved to reissue if it fails. Right now I think it is only an error and the application needs to handle that.

This brings me to one of the changes you had done in the PR: Handling of spi.begin() return value. The code is not very robust right now as the return value is just ignored. I think that the correct approach is to handle the SPI lock correctly instead and have a fatal execution handler instead. What to you think about that?

Cheers!

SlashDevin commented 9 years ago

Seems like you have a much faster SD card then I have.

I'm surprised that my card, manfactured Dec 2006, would be considered "fast". It is a real SanDisk card, though.

[should it] handle the SPI lock correctly instead and have a fatal execution handler?

In my case, knowing that the INIT hadn't completed was the key. Getting an ILLEGAL_CMD for the READ_OCR was very misleading. If SD::begin had checked the individual SPI operations and returned an appropriate error code, that would have saved me a lot of grief. I think I remember seeing that error codes were in the latest SDFat.

I would first ask if you want to maintain and extend the SD/FAT16 classes yourself. Adding an error handler or refactoring makes it harder for you to track the other sources (i.e., the merge gets harder). If you do want to branch your own version and begin to diverge, I say, "Great!" Well, for me, it's great. For you, it's more you have to know and do, but the result is another tightly-integrated work of art: easier to predict, easier to debug, and ultimately easier to read (my prime criterion).

If you are asking whether I would use an Event::Handler for a queued-up error code, I could choose to implement that as long as the returned error code is available. I'm not sure I would defer errors to the Event dispatch loop.

If you are asking if I would use an Event::Handler or function ptr callback, I would say "No", as long as the error codes are available. I think most of these errors are meaningful in the calling context.

FWIW, I've always been a big fan of persistence, especially when comms could (and will) fail. So SD & FAT16 are critical to my application. As I get more things running concurrently, error codes may have to get deferred by something like a "fatal exception handler". That is, waiting for the GO_IDLE_STATE to finish may not be an option. Increasing the quantity of information that persists could also force operations into the event loop.