triffid / LPC17xx-DFU-Bootloader

DFU Bootloader for LPC17xx family
GNU General Public License v2.0
53 stars 52 forks source link

SDCard_disk_initialize always fails #6

Open hg42 opened 6 years ago

hg42 commented 6 years ago

I have an early Smoothieboard v1.0 (the one with A498x drivers) so I am not totally sure if it has a different pinout.

This code always fails on my board:

    SDCard_init(P0_9, P0_8, P0_7, P0_6);
    if (SDCard_disk_initialize() == 0)
        check_sd_firmware();

SDCard_disk_initialize() finally goes here:

int SDCard_initialise_card() {
    // Set to 25kHz for initialisation, and clock card with cs = 1
    SPI_frequency(25000);
    GPIO_set(_cs);

    for(int i=0; i<16; i++) {
        SPI_write(0xFF);
    }

    // send CMD0, should return with all zeros except IDLE STATE set (bit 0)
    if(SDCard__cmd(SDCMD_GO_IDLE_STATE, 0) != R1_IDLE_STATE) {
        fprintf(stderr, "Could not put SD card in to SPI idle state\n");
        return cardtype = SDCARD_FAIL;
    }
...

and fails with "Could not put SD card in to SPI idle state".

But this works with the bootloader packaged with Smoothieware (it worked before and it works now). Which version is this? The Smoothieware documentation simply links to this github repo without any tag or branch, so I assume it should actually still work.

I also guessed it could be some older version, so I tried several versions like

I also tried to change some values that might have an influence on this code:

Googling for the error message indicates the pins could be wrong, but it works with the version packed with Smoothieware and I think the pins never changed...

Any guess what else could be going on?

hg42 commented 6 years ago

oh, and I replaced this code with a much longer delay:

    // give SD card time to wake up
    for (volatile int i = (1UL<<12); i; i--);

and I also do several retries.

May be I have not tried every combination of all the measures above. Any hint which one I should try?

triffid commented 6 years ago

This code was developed on a range of smoothieboards including the old v1.0 you mention and even older beta versions of the board which were never available for sale.

It is in use on thousands of boards across the world and is the primary way that smoothie users update their firmware - IOW you seem to be the only one with this issue. Most other users that have problems resembling this find that simply using a different SD card solves it.

The only real way to find out what's happening would be to put a 'scope on the pins and have a look, or dig into those functions and print out every byte that goes across the SPI - which is quite doable if you use (say) 2MBaud for debug serial and cut the SPI frequency down to 10kHz or less.

Best of luck if you choose to go digging! I'm curious what you might find.

hg42 commented 6 years ago

thanks for answering...

The main question is, what is different between the hex file in the smoothieware repo (in folder bootloader) and this repo. I could find out myself, but I cannot relate it to some version in this repo.

The hex file works as bootloader with all my cards for years.

This repo compiled and uploaded works in dfu mode and I can also add blinking lights, serial output etc. but it does not load firmware.bin because it does not detect any of my current cards (all of them work with the hex file).

From the file date it's obvious that the hex file is not equal to the current version. It could eventually be older than this repo, because I think the file date isn't the original one.

So please could you please be so kind to tell me which version of this repo it is? I think it should be possible to recreate it. Or should I ask on some smoothieware support channel? do they have a different source?

hg42 commented 6 years ago

what you say about printing bytes, that should be only 16 (or 24) 0xFF for the so called 'idle mode'. So nothing complicated I guess. I assume it's something very simple.

triffid commented 6 years ago

The main question is, what is different between the hex file in the smoothieware repo (in folder bootloader) and this repo. I could find out myself, but I cannot relate it to some version in this repo.

As far as I know, the firmware bin in the smoothie repo been built from this repo - likely grabbed from the build server at http://triffid-hunter.no-ip.info/DFU-Bootloader.html which shows an older commit date than https://github.com/Smoothieware/Smoothieware/commit/504e25e140b791642f7d79edcba001223d4cc8a0

I probably should give that a poke at some stage and make sure it's working, although there's been no significant code changes in the past few years.

You may need to check your toolchain, we have encountered some versions that produce very strange breakage with subtle but infuriating symptoms just like the one you describe.

I've had situations where simply inserting or removing an ostensibly innocuous printf could make or break apparently unrelated code blocks, and updating or downgrading toolchain version would often sort it out entirely.

If the binary from the build server works but the one you compile doesn't, the problem is almost certainly local to your system/toolchain and may require some interesting debugging to find a code change that makes it work for you and doesn't break it for anyone else.

hg42 commented 6 years ago

Thanks for the answer again...

I thought of the toolchain, too. Some optimization here and there and the timing changes. Given the fact, that timing is done with loops and tricks to ensure they are not optimized away by the compiler, it could still find a way to notice it's unused code and drop it or make it faster. Taking an old compiler is understandable but this will certainly create problems in the future (e.g. smoothieware is not compilable with a current compiler and I only wait for the point where the old compiler also fails, because it may not run any more).

Usually I follow the instructions how to compile the project, at least at first, to be sure it works. But for the bootloader there is no recommendation for the toolchain. Which one do you use? should I use the one downloaded by smoothieware install?

hg42 commented 6 years ago

I installed the hex file from LPC17xx-DFU-Bootloader Nightly Builds http://triffid-hunter.no-ip.info/DFU-Bootloader.html which works, so it's probably a compiler compatibility problem.

My compiler is gcc 6.3 (arm...), which should be quite a difference to 4.8...

But I don't see a problem in the main code...may be there is something in some of the lower level code. I had some similar experience lately while trying to compile a simple LED blinking program for this smoothieboard with platformio.

Curious, the cmd0 command only gets 0xFF as response, if I remove/insert the card while looking at the response it sometimes changes. I am going to think that some very low level thing doesn't work, may be reading a port or some initialization code.

As a next step I'll try to use gcc 4.9 from ubuntu xenial, but have to do updates currently, so installing is not an option for today, I will probably do tomorrow. If 4.9 fails, I will try to use the compiler installed in the smoothieware directory, which is known to work and I guess you are also using it for the bootloader.

Btw. I looked in the SDcard standard...what a mess...

I also found this: What is the correct command sequence for microSD card initialization in SPI? - Electrical Engineering Stack Exchange https://electronics.stackexchange.com/questions/77417/what-is-the-correct-command-sequence-for-microsd-card-initialization-in-spi

I think at least the basic init sequence is ok.

hg42 commented 6 years ago

ok, updates finished faster than expected. So I installed gcc-arm-none-eabi/xenial which is 4.9.3...

...and it WORKED...

I'll try other versions tomorrow.

I also downgraded binutils-arm-none-eabi/xenial libnewlib-arm-none-eabi/xenial libstdc++-arm-none-eabi-newlib/xenial

This wasn't forced by downgrading to gcc-arm-none-eabi/xenial, so I'll also try which of the packages makes the difference. It could be a library thing or some change in the compiler (which I guess).

hg42 commented 6 years ago

this combination works, so everything is current but the gcc is 4.9.3:

ii  binutils-arm-none-eabi         U    2.28-5+9+b3           GNU assembler, linker and binary utilities for ARM Cortex-A/R/M processors
ii  gcc-arm-none-eabi              U    15:4.9.3+svn231177-1  GCC cross compiler for ARM Cortex-A/R/M processors
ii  libnewlib-arm-none-eabi        U    2.4.0.20160527-3      C library and math library compiled for bare metal using Cortex A/R/M
ii  libstdc++-arm-none-eabi-newlib U    15:6.3.1+svn253039-1+10  GNU Standard C++ Library v3 for ARM Cortex-A/R/M processors (newlib)

gcc 5.4.1 from debian stable doesn't work:

ii  gcc-arm-none-eabi              U    15:5.4.1+svn241155-1  GCC cross compiler for ARM Cortex-A/R/M processors
triffid commented 6 years ago

Wow, great detective work!

So it is a toolchain issue after all.. Ideally we should have code that works with anything, but practically we need to find out what gcc-6.x doesn't like and fix it without breaking anything for 4.x.

Writing the 0xFFs with CS deasserted (high) is simply a method of 1) introducing a startup delay controlled by the SPI frequency and 2) flushing any crud that might be lying around in the SD card's buffers for some reason (as noted in your link, at least 74 clocks - we send 16*8=128).

Scoping the lines would give us a vastly clearer picture on what's happening with the SPI signals between the working and non-working versions.

Your link suggestion https://electronics.stackexchange.com/questions/77417/what-is-the-correct-command-sequence-for-microsd-card-initialization-in-spi/238217#238217 is a great read - perhaps you could try putting some retry-delay sequences around the CMD0 and CMD8 and see if that improves things? Perhaps your newer GCC is generating code that simply does things too fast or something?

I don't have a smoothieboard handy to play with at the moment - if you do end up generating a pull request, please ensure it still works as expected with the older toolchain installed by smoothie's script as well as whatever new fancy compiler you want to play with :D

hg42 commented 6 years ago

I am currently at a point where I managed to get SOFT_SPI working (I added code to force this even with hardware spi pins). Curiously SOFT_SPI works but "HARD_SPI" still doesn't.

I also added some delays like this:

spi__delay(100*delay);
        while ((sspr->SR & SSP_SR_TNF) == 0);
spi__delay(100*delay);
        sspr->DR = data;
spi__delay(100*delay);
        while ((sspr->SR & SSP_SR_RNE) == 0);
spi__delay(100*delay);
        r = sspr->DR & 255;
spi__delay(100*delay);

and also added delays between several initialization lines in case the speed does matter there.

I early changed SOFT_SPI to GPIO_xxx functions. May be this makes a difference, but I don't know.

The behavior could point to a problem in initialization, perhaps the clock frequency is wrong? but it works with gcc 4.9.3...

Currently I have no clue what could be wrong. I don't feel like analyzing the (hardware) SPI initialization with all those numbers. This should probably use constants to make clearer what it does. In this state I cannot verify what it does, which modes etc.

Clock calculations result in these values:

SPI:Using SSP1
SPI: frequency 10000: CPSR=196, CR0=2816
SPI: frequency 100000: CPSR=250, CR0=0
hg42 commented 6 years ago

I currently cannot use a scope, but I used a logic pen to verify sck, cs, mosi, miso at least are doing something senseful. However, the timing isn't checked by that.

Btw. for 25kHz it's:

SPI: frequency 10000: CPSR=196, CR0=2816
SPI: frequency 25000: CPSR=232, CR0=768

Summary for gcc 6.3:

CFLAGS = $(FLAGS) -std=gnu99 -pipe -fno-builtin-printf -fno-builtin-fprintf -fno-builtin-vfprintf -fno-builtin-puts ASFLAGS = $(FLAGS) CXXFLAGS = $(FLAGS) -fno-rtti -fno-exceptions -std=gnu++0x

hg42 commented 6 years ago

ok, I tried:

which indicates the problem is entirely related to cmd0 processing or ... perhaps generally SDCARD_cmd ... ok, next test doing the rest in hardware spi ...

if I do hardware spi for the 16 x 0xFF sequence, and then software spi for cmd0 and the rest in hardware spi, cmd8 thinks it's "not an SD card?" because it's not in idle state after sending cmd8. Actually it fails if I do any cmd after the 0xFF sequence in hardware SPI.

So the failure seems to be in SDCard_cmd in general. Or it may be still SPI_write, given that the 0xFF sequence is not very critical. ok -- let's test -- in fact, it also works, if I simply replace the 0xFF sequence with a delay.

Now I'm running out of options, I'll better input some beer and pan cakes and a movie...tomorrow is another day...

triffid commented 6 years ago

Well the hardware SPI driver is as simple as things can get, essentially just:

        while ((sspr->SR & SSP_SR_TNF) == 0);
        sspr->DR = data;
        while ((sspr->SR & SSP_SR_RNE) == 0);
        r = sspr->DR & 255;

so I suppose the next step is to fill that file with printf, and/or examine the assembly generated by the non-working toolchain.

It may be possible to shoe-horn the MRI debugger that smoothie uses into the bootloader, if you change the linker script to allow it to use more flash. That might be useful, then you can point GDB at it and single-step through the whole SD/SPI init section.

Once we know what the actual root cause is, hopefully there'll be some obvious code change we can make so that it works with all toolchain versions

hg42 commented 6 years ago

I now have a version that works with gcc 6.3 ...but not ready yet...see at the end...

Based on my guess, that there must be some quirk in the initialization sequence, I imported some functionality from lpc17xx_ssp.c, while stripping it down to necessary features.

I added frequency calculation, init function, and readwrite from this file and made them switchable.

BUT...

Unfortunately, loading of firmware.bin is now much slower. But this is independent of enabling the new functions. Even with the old implementation (including a lot of hopefully disabled test harness) compiled with gcc 4.9.3 is still very slow. I obviously left something in the code I am not aware of.

So, next step will be to cherrypick essential changes to a clean branch to find out what happened

--> to be continued

triffid commented 6 years ago

Frequency is off? hmm, now that I look at it I think you're right.. Put SystemCoreClock/4 instead of 25M at https://github.com/triffid/LPC17xx-DFU-Bootloader/blob/develop/spi.c#L119 and comment out L61,77,94, and maybe cap the register values instead of the frequency - that section does look like it needs a total revamp.

git diff will help you find what you've changed ;)

hg42 commented 6 years ago

I first thought 25Mhz could be wrong. I think this depends on init. But the other init also creates a 25Mhz internal SSP clock.

What I mean is, if I want 4Mhz, I get around 4.2 etc. That's 5% too high. The dividers are diffferent from the lpc17xx variant, which always keeps the frequency below the target.

Thanks, I know how to use version control, it's my daily business. But a lot if changes are a lot of changes. In all those trst lines must be some that are still active. It's kind of "living" test code and at some point the brain stops keeping track of all that locations.

Another problem is mixed tab+spaces and tab size of 4...the usual open aource policy is to eliminate tabs. The only way to safely handle multiple projects and developers at once. Therefore many tools change tabs to spaces automatically...

hg42 commented 6 years ago

here some data from the current calculation (CRf is the real factor, which is (CR0>>8)&0xFF+1 so that real_f = sspclk/CPSR/CRf):

old clock:
spi: f=25000 sspclk=25000000 delay=1000 CPSR=232 CRf=4 -> 26939
spi: f=1000000 sspclk=25000000 delay=25 CPSR=24 CRf=1 -> 1041666
spi: f=4000000 sspclk=25000000 delay=6 CPSR=6 CRf=1 -> 4166666

and this is how it is calculated by the library function:

SSPR_Clock:
spi: f=25000 sspclk=25000000 delay=1000 CPSR=4 CRf=250 -> 25000
spi: f=1000000 sspclk=25000000 delay=25 CPSR=2 CRf=13 -> 961538
spi: f=4000000 sspclk=25000000 delay=6 CPSR=2 CRf=4 -> 3125000

Now you might say 5% is ok (better tahn using ca. 3MHz instead of 4MHz) but then you cold use a 5% bigger value for the maximum value. With the current algorithm I am not sure if the error could eventually be much higher...

Unfortunately I had not enough time today to go further. At least I found, that every method is slow now, SOFTSPI, the current code and the new code in different configurations.

hg42 commented 6 years ago

Ok...just had a look...that was easy.

I had added a short delay to setleds(), which isn't good if each sector is setting the leds (even if they don't change). Because they change only a few times for a firmware file I originally had the impression, this could not slow down the reading and flashing that much.

I also added a printf to show all configured registers:

works: 4.9.3 old init PINSEL0=0xA8050 PCLKSEL0=0x1C00C0 PCONP=0x80208408 CPSR=0xE8 CR0=0x307  CR1=0x2
FAILS: 6.3.1 old init PINSEL0=0xA8050 PCLKSEL0=0x1C00C0 PCONP=0x80208408 CPSR=0xE8 CR0=0x300  CR1=0x2
works: 4.9.3 SSP_Init PINSEL0=0xA8050 PCLKSEL0=0xC00C0  PCONP=0x80208408 CPSR=0x4  CR0=0xF907 CR1=0x2
works: 6.3.1 SSP_Init PINSEL0=0xA8050 PCLKSEL0=0xC00C0  PCONP=0x80208408 CPSR=0x4  CR0=0xF907 CR1=0x2

so, CR0=0x300 is the problem, the lower byte should be 0x07.

Now, this works

            sspr->CR0 &= ~0xFFFF;
            sspr->CR0 |= SSP_DATABIT_8 | SSP_FRAME_SPI | SSP_CPHA_FIRST | SSP_CPOL_HI;

but this doesn't:

            sspr->CR0 = SSP_DATABIT_8 | SSP_FRAME_SPI | SSP_CPHA_FIRST | SSP_CPOL_HI;

despite the fact, the expression on the right is 0x7 (I added CPHA and CPOL, but they are zero), the assignment results in CR0 being zero afterwards. The "|=" makes the difference, the "&=" isn't the solution but necessary to be correct at the end.

So, it seems to be some optimization for assignments, that doesn't hit assignments combined with or.

The code also works with 12.5MHz, but it doesn't result in faster flashing, may be my card is too slow: spi: f=12500000 sspclk=25000000 delay=2 CPSR=2 CRf=1 -> 12500000 4MHz is fast enough anyways.

Now while everything is working, I'll probably reorder things into function groups and/or files instead of many #if clauses to make everything more clear. I am not sure which parts you want to integrate into your code base.

I will first create a minimal PR that fixes the problem

hg42 commented 6 years ago

you may look into my repo, on branch hg42 for a lot of other code changes... if you are interested, I can create some more PRs for each feature (after cleanup and sorting). Otherwise I would keep it like this for now and just test it by using. I may improve it further or throw things away...

hg42 commented 6 years ago

but this doesn't: sspr->CR0 = ...

given that the other function also uses SSPx->CR0 = tmp, the assignement cannot be the problem in itself. So even if it works now, I do not think #7 is a final solution.

while debugging yesterday, I thought they might have introduced that tmp variable for exactly that reason, but I tried to use it for the former implementation and it didn't work either.

hg42 commented 6 years ago

Today, I had the same effect in the new functions. Not sure what's going on. At least the &=~ |= sequence solved this, too.

While testing this, I noticed when requesting a frequency of 4200000 (experimentally adding 5% for the new algorithm), I got 6250000 instead with the old algorithm: spi: f=4200000 sspclk=25000000 CPSR=4 CRf=1 -> 6250000 So it can go far beyond the target.

LineF commented 4 years ago

Just compiled the official develop branch from triffid with gcc 7.3.1. Still the same problem. Fix #7 helped - SD card is working now.

triffid commented 4 years ago

Guess I should merge that PR, now that it's affecting more than one person with more than one version of gcc

On Mon, 18 Nov 2019 at 05:48, Martin Patzel notifications@github.com wrote:

Just compiled the official develop branch from triffid with gcc 7.3.1. Still the same problem. Fix #7 https://github.com/triffid/LPC17xx-DFU-Bootloader/pull/7 helped

  • SD card is working now.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/triffid/LPC17xx-DFU-Bootloader/issues/6?email_source=notifications&email_token=AABICGVPO6JFBJMEY6SZ4L3QUG32FA5CNFSM4EJRNGX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEIXBKI#issuecomment-554791081, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABICGWFACRHUQE4CVXSF2DQUG32FANCNFSM4EJRNGXQ .