rtfm based stm32f030 debug build causes relcation truncated error

x37v commented 7 years ago

here is an example project: https://gitlab.com/xnor/stm32f0308-disco-rust

If I build it without --release I get

target/thumbv6m-none-eabi/debug   /deps/libstm32f030-5466fdead1a18a6d.rlib(stm32f030-5466fdead1a18a6d.0.o): In function `WWDG':
      stm32f030.cgu-0.rs:(.text+0x0): relocation truncated to fit: R_ARM_THM_JUMP11 against symbol `DEFAULT_HANDLER' defined in .text.DEFAULT_HANDLER section in /home/alex/projects/modular/threshpan/target/thumbv6m-none-eabi/debug/deps/libcortex_m_rt-881d17200def560b.rlib(cortex_m_rt-881d17200def560b.0.o)

japaric commented 7 years ago

I have seen this before though haven't nailed down the exact cause.

From what I have seen:

This only happens for the thumbv6m-none-eabi target
It only happens when optimization is not enabled and debuginfo is enabled
This not a problem with RTFM itself but with how interrupt handlers are registered (this is done by cortex-m-rt). As in you can cause this problem with RTFM.
IMO, this might be a LLVM bug since LLVM is in charge of both emitting debuginfo and object files; it seems that LLVM is generating object files that can't be linked.

Out of curiosity, what happens when you compile without --release but with LTO enabled (e.g. xargo rustc -- -C lto)? That should produce a single object file so linking the object file should be straightforward. I expect that LLVM might error in that scenario.

cc @therealprof, who may know more about this problem

therealprof commented 7 years ago

Hm, I thought I opened a bug report for this before or at least mentioned it somewhere...

The problem is not debug related, I can easily reproduce it with --release builds, too. It happens when binary code grows so large that the 11 bit offsets available to the branch instruction are not sufficient to reach the jump target anymore.

I've no idea how (and where) this could be addressed but monomorphisation, heavy inlining, and LTO (as well as lack of optimisation in debug builds) are the source of the issue here because they all lead to few but huge functions.

therealprof commented 7 years ago

Looking into the instruction set it seems that Cortex-M should happily support the larger version of the branch as well. Maybe it would suffice to tell that to the linker somehow...

x37v commented 7 years ago

@therealprof a bit disheartening that you get this problem with --release as well as I hope to use this for something that I'll share with others.. though, glad to see that there is hope!

@japaric xargo rustc -- -C lto gave the same error:

error: linking with arm-none-eabi-ld failed: exit code: 1 | = note: "arm-none-eabi-ld" "-L" "/home/alex/.xargo/lib/rustlib/thumbv6m-none-eabi/lib" "/home/alex/projects/modular/threshpan/target/thumbv6m-none-eabi/debug/deps/stm32f0308_disco_rust-60ecd4ad81e058b7.0.o" "-o" "/home/alex/projects/modular/threshpan/target/thumbv6m-none-eabi/debug/deps/stm32f0308_disco_rust-60ecd4ad81e058b7" "--gc-sections" "-L" "/home/alex/projects/modular/threshpan/target/thumbv6m-none-eabi/debug/deps" "-L" "/home/alex/projects/modular/threshpan/target/debug/deps" "-L" "/home/alex/projects/modular/threshpan/target/thumbv6m-none-eabi/debug/build/cortex-m-rt-a3623460a99781ee/out" "-L" "/home/alex/.xargo/lib/rustlib/thumbv6m-none-eabi/lib" "-Bstatic" "/home/alex/.xargo/lib/rustlib/thumbv6m-none-eabi/lib/libcompiler_builtins-ad42e860445b13d0.rlib" "-Tlink.x" "-Bdynamic" = note: /home/alex/projects/modular/threshpan/target/thumbv6m-none-eabi/debug/deps/stm32f0308_disco_rust-60ecd4ad81e058b7.0.o: In function WWDG': stm32f0308_disco_rust.cgu-0.rs:(.text+0x0): relocation truncated to fit: R_ARM_THM_JUMP11 againstDEFAULT_HANDLER'

BTW, if you want me to move this over to cortex-m-rt I can do that.. I suspected that I might be posting to the wrong specific location in the correct ecosystem..

therealprof commented 7 years ago

@japaric I figured out what the problem is... svd2rust

Seemingly the default CPU model for the armv6-m architecture is broken. I tried various options including using the -mcpu=cortex-m0 and -mcpu=cortex-m3 options on the generated assembly and the latter automatically changes branches where the target doesn't fit into the available 2 bytes into the 4 bytes form of the branch while the default model and -mcpu=cortex-m0 (which actually might be the default model) keeps it as-is causing the linker to barf.

However, if I explicitly change the short branch in the code emitted by svd2rust into the long form, it'll happily compile and link the binaries, cf.:

diff --git a/src/svd.rs b/src/svd.rs
index 149a3ed..8bfc3f2 100644
--- a/src/svd.rs
+++ b/src/svd.rs
@@ -11,7 +11,7 @@ pub mod interrupt {
         "
                 .thumb_func
                 DH_TRAMPOLINE:
-                    b DEFAULT_HANDLER
+                    bl DEFAULT_HANDLER
                 "
     );
     #[cfg(feature = "rt")]

@x37v Can you with the above change on your code?

x37v commented 7 years ago

@therealprof yes, that did solve my problem! THANKS SO MUCH! I was actually also able to build without --release once i set the optimization level to 1.. tried that in the mean time, but this works unoptimized!

pftbest commented 7 years ago

This can't be right, using branch with link will clobber the link register, so the interrupt handlers will fail to return. To make it work with bl, we need to add a proper function.

Also, the documentation here says that b.w instruction can be used in thumb 2 mode to jump to ±16MB. But LLVM says it's only available on thumbv7 for some reason.

therealprof commented 7 years ago

@pftbest Whoops, you're right. I picked the wrong mnemonic, b.w is what I wanted to say. And you're also right that it can't be used due to the compiler/assembler complaining. :(

therealprof commented 7 years ago

@pftbest Okay, I checked around and there's nothing that would the assembly accept that function, BUT: why do we do that manual jump in the first place? DH_TRAMPOLINE doesn't do anything useful so we might as well just get rid of that...

diff --git a/src/svd.rs b/src/svd.rs
index 149a3ed..df566cf 100644
--- a/src/svd.rs
+++ b/src/svd.rs
@@ -8,15 +8,7 @@ pub mod interrupt {
     use bare_metal::Nr;
     #[cfg(feature = "rt")]
     global_asm!(
-        "
-                .thumb_func
-                DH_TRAMPOLINE:
-                    b DEFAULT_HANDLER
-                "
-    );
-    #[cfg(feature = "rt")]
-    global_asm!(
-        "\n.weak WWDG\nWWDG = DH_TRAMPOLINE\n.weak PVD\nPVD = DH_TRAMPOLINE\n.weak RTC\nRTC = DH_TRAMPOLINE\n.weak FLASH\nFLASH = DH_TRAMPOLINE\n.weak RCC_CRS\nRCC_CRS = DH_TRAMPOLINE\n.weak EXTI0_1\nEXTI0_1 = DH_TRAMPOLINE\n.weak EXTI2_3\nEXTI2_3 = DH_TRAMPOLINE\n.weak EXTI4_15\nEXTI4_15 = DH_TRAMPOLINE\n.weak TSC\nTSC = DH_TRAMPOLINE\n.weak DMA_CH1\nDMA_CH1 = DH_TRAMPOLINE\n.weak DMA_CH2_3\nDMA_CH2_3 = DH_TRAMPOLINE\n.weak DMA_CH4_5_6_7\nDMA_CH4_5_6_7 = DH_TRAMPOLINE\n.weak ADC_COMP\nADC_COMP = DH_TRAMPOLINE\n.weak TIM1_BRK_UP_TRG_COM\nTIM1_BRK_UP_TRG_COM = DH_TRAMPOLINE\n.weak TIM1_CC\nTIM1_CC = DH_TRAMPOLINE\n.weak TIM2\nTIM2 = DH_TRAMPOLINE\n.weak TIM3\nTIM3 = DH_TRAMPOLINE\n.weak TIM14\nTIM14 = DH_TRAMPOLINE\n.weak TIM16\nTIM16 = DH_TRAMPOLINE\n.weak TIM17\nTIM17 = DH_TRAMPOLINE\n.weak I2C1\nI2C1 = DH_TRAMPOLINE\n.weak SPI1\nSPI1 = DH_TRAMPOLINE\n.weak SPI2\nSPI2 = DH_TRAMPOLINE\n.weak USART1\nUSART1 = DH_TRAMPOLINE\n.weak USART2\nUSART2 = DH_TRAMPOLINE\n.weak CEC_CAN\nCEC_CAN = DH_TRAMPOLINE\n.weak USB\nUSB = DH_TRAMPOLINE"
+        "\n.weak WWDG\nWWDG = DEFAULT_HANDLER\n.weak PVD\nPVD = DEFAULT_HANDLER\n.weak RTC\nRTC = DEFAULT_HANDLER\n.weak FLASH\nFLASH = DEFAULT_HANDLER\n.weak RCC_CRS\nRCC_CRS = DEFAULT_HANDLER\n.weak EXTI0_1\nEXTI0_1 = DEFAULT_HANDLER\n.weak EXTI2_3\nEXTI2_3 = DEFAULT_HANDLER\n.weak EXTI4_15\nEXTI4_15 = DEFAULT_HANDLER\n.weak TSC\nTSC = DEFAULT_HANDLER\n.weak DMA_CH1\nDMA_CH1 = DEFAULT_HANDLER\n.weak DMA_CH2_3\nDMA_CH2_3 = DEFAULT_HANDLER\n.weak DMA_CH4_5_6_7\nDMA_CH4_5_6_7 = DEFAULT_HANDLER\n.weak ADC_COMP\nADC_COMP = DEFAULT_HANDLER\n.weak TIM1_BRK_UP_TRG_COM\nTIM1_BRK_UP_TRG_COM = DEFAULT_HANDLER\n.weak TIM1_CC\nTIM1_CC = DEFAULT_HANDLER\n.weak TIM2\nTIM2 = DEFAULT_HANDLER\n.weak TIM3\nTIM3 = DEFAULT_HANDLER\n.weak TIM14\nTIM14 = DEFAULT_HANDLER\n.weak TIM16\nTIM16 = DEFAULT_HANDLER\n.weak TIM17\nTIM17 = DEFAULT_HANDLER\n.weak I2C1\nI2C1 = DEFAULT_HANDLER\n.weak SPI1\nSPI1 = DEFAULT_HANDLER\n.weak SPI2\nSPI2 = DEFAULT_HANDLER\n.weak USART1\nUSART1 = DEFAULT_HANDLER\n.weak USART2\nUSART2 = DEFAULT_HANDLER\n.weak CEC_CAN\nCEC_CAN = DEFAULT_HANDLER\n.weak USB\nUSB = DEFAULT_HANDLER"
     );
     #[cfg(feature = "rt")]
     extern "C" {

That removes one unnecessary indirection from code that actually does compile and fixes this particular problem since we're jumping to the correct function right from our exception/interrupt table where we don't have any address limitations...

It'll likely not fix the problem I had before wrt. functions becoming too big to be jumped to (which I've addressed in the code) and should be addressed by the compiler, but this seems like a win-win to.

NB: I have no hardware here so I can't very it but it sure looks good to me.

pftbest commented 7 years ago

No, we can't remove this trampoline, because it will silently break non-lto builds. Weak references can only point to symbols defined in the same object file, but default handler is defined in another crate, so it will end up in different object file. This bug was reported here: https://github.com/japaric/cortex-m-rtfm/issues/39

pftbest commented 7 years ago

I think the only working solution here is to make DH_TRAMPOLINE a proper rust function. This will make executable slightly bigger, unfortunately, but it shouldn't affect the performance, because default_handler is only used for error handling.

It may affect a stack trace when debugging, not sure if it counts as a breaking change. I don't have a board atm so I can't test it.

therealprof commented 7 years ago

Hm, non-lto builds... Those still exist? ;)

I'll have to look a bit closer at this in a non-lto context. I'm still not exactly sure why the trampoline needs to exist at all, my preference would be to fix the visibility of the symbols. As I said before this will most likely not fix the compiler issue at hand (refusing wo accept the b.w for armv6m) so it's very likely that we will run into the same problem sooner or later again... Not sure how to properly report this though.

x37v commented 7 years ago

interesting, @pftbest, bl DEFAULT_HANDLER does build for me, debugging is now more full featured with dev builds and I am able to get ADC interrupts at least.. maybe I'm confused about where the discussion has gone.

x37v commented 7 years ago

@therealprof and @pftbest I could try to get an stm32f0 based discovery board to you if you want some hardware to test on.. Though I can also run tests on my hardware if you'd like.. about to be gone for a long weekend later tonight though.

therealprof commented 7 years ago

@pftbest is right that BL clobbers the link register so technically we can not return to wherever the link register was legitimately set. However I'm not sure that this is relevant because we're talking about the default handler here which usually just halts execution by firing off an breakpoint instruction.

therealprof commented 7 years ago

@x37v No need, I have a ton of STM32 stuff here and I especially like the F0 series which is probably why @japaric notified me in the first place. ;)

pftbest commented 7 years ago

@x37v, the issue here is that processor relies on EXC_RETURN value being present in LR register to return from the interrupt handler, but bl instruction will erase it, so it will never return.

This does not break the provided default_handler, since it goes into infinite loop and never returns, but the user may override it using default_handler! macros, and try to return from it.

x37v commented 7 years ago

@pftbest I'm still a bit confused... my understanding is that interrupt handlers get executed after an interrupt arrives and execution jumps out of your main loop [in the rtfm case a loop waiting for interrupts] execute some code and then jump back. Are you saying that the default handler, before being overridden, normally goes into an infinite loop and never returns to the main loop?.. or is this simply an effect of the bl instruction?

therealprof commented 7 years ago

@x37v The default handler is only used if the system fires an exception or an interrupt and you haven't provided your own exception or interrupt handler. You can override the default_handler, too if you want to do anything specific in this case however the default implementation is more or less the only sane implementation one can have in this situation: Set a breakpoint and do nothing more.

x37v commented 7 years ago

@therealprof AHH, that makes sense. So, beyond the potential override, is it problematic as is, with no way to return?

therealprof commented 7 years ago

@x37v At that point the MCU is pretty much in a dead end, so other than saying goodbye I don't think there's much you can do to re-enter the program in orderly fashion other than a reset... Even if you have the link register; who say's it points to a place where you can actually reenter?

pftbest commented 7 years ago

@therealprof, why is MCU in a dead end?

Nothing serious would happen if we just return from some unhandled GPIO interrupt. There is a way to get the interrupt number that is currently being serviced, so a reasonable implementation may check that we are not in hard fault or some other bad state, and otherwise just log a spurrious interrupt and return.

therealprof commented 7 years ago

@pftbest Why would you enable an interrupt you're not willing to handle? And if your willing to handle it, why not have a specific handler for that? Using the default handler has a number of drawbacks; sure with enough effort you might be able to figure out why ended in there but all the exceptions you're not willing to deal with also end up in there, i.e. the really bad stuff from which a useful recovery is typically not possible.

There's a reason that in 99.99% of all cases the default handler is used to

halt the CPU for post-mortem analysis
output some diagnostic data
reset the MCU

or any combination thereof.

therealprof commented 7 years ago

@pftbest You're right. The easiest way to make that work seems to be a proper Rust function; I tried all kinds of tricks with assembly but the simplest solution is the obvious one:

    extern "C" {
        fn DEFAULT_HANDLER();
    }

    #[allow(non_snake_case)]
    #[naked]
    #[no_mangle]
    pub unsafe fn DH_TRAMPOLINE() {
        DEFAULT_HANDLER();
    }

The binary code grows by 4 bytes.

It also adds the additional benefit of properly naming the function, but here's the kicker; it also uses the bl instruction:

│ -08000480 <ADC_COMP>:
│ +08000480 <DH_TRAMPOLINE>:
│ - 8000480:    e059            b.n     8000536 <BUS_FAULT>
│ + 8000480:    f000 f85b       bl      800053a <BUS_FAULT>
│ + 8000484:    4770            bx      lr

🤔

perlindgren commented 7 years ago

Hi

A comment +/- related.

A sensible thing to do in the default handler is to unwind the stack, that makes it easier to trace the cause of the error. (I made some tests earlier and it works…) The question is just what to do with the “trace info”. I sent it over the ITM, but as there is no flow control for the ITM, the debugger buffer (on the nucleo/st-link) may overflow, so you may loose characters. One can also inspect the stack directly from gdb, perhaps there is some scripting possible to facilitate this (Japaric might know…)

Best, Per

On 05 Sep 2017, at 00:58, therealprof notifications@github.com<mailto:notifications@github.com> wrote:

@pftbesthttps://github.com/pftbest You're right. The easiest way to make that work seems to be a proper Rust function; I tried all kinds of tricks with assembly but the simplest solution is the obvious one:

extern "C" {
    fn DEFAULT_HANDLER();
}

#[allow(non_snake_case)]
#[naked]
#[no_mangle]
pub unsafe fn DH_TRAMPOLINE() {
    DEFAULT_HANDLER();
}

The binary code grows by 4 bytes.

It also adds the additional benefit of properly naming the function, but here's the kicker; it also uses the bl instruction:

│ -08000480 : │ +08000480 : │ - 8000480: e059 b.n 8000536 │ + 8000480: f000 f85b bl 800053a │ + 8000484: 4770 bx lr

🤔

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/japaric/cortex-m-rtfm/issues/42#issuecomment-327036730, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AD5naDUQ9VcMi_3hjpGJvYp8OhziHoYjks5sfICigaJpZM4PIR-X.

{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/japaric/cortex-m-rtfm","title":"japaric/cortex-m-rtfm","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/japaric/cortex-m-rtfm"}},"updates":{"snippets":[{"icon":"PERSON","message":"@therealprof in #42: @pftbest You're right. The easiest way to make that work seems to be a proper Rust function; I tried all kinds of tricks with assembly but the simplest solution is the obvious one:\r\n\r\n extern \"C\" {\r\n fn DEFAULT_HANDLER();\r\n }\r\n\r\n #[allow(non_snake_case)]\r\n #[naked]\r\n #[no_mangle]\r\n pub unsafe fn DH_TRAMPOLINE() {\r\n DEFAULT_HANDLER();\r\n }\r\n\r\n\r\nThe binary code grows by 4 bytes.\r\n\r\nIt also adds the additional benefit of properly naming the function, but here's the kicker; it also uses the bl instruction:\r\n\r\n│ -08000480 \u003cADC_COMP\u003e:\r\n│ +08000480 \u003cDH_TRAMPOLINE\u003e:\r\n│ - 8000480: e059 b.n 8000536 \u003cBUS_FAULT\u003e\r\n│ + 8000480: f000 f85b bl 800053a \u003cBUS_FAULT\u003e\r\n│ + 8000484: 4770 bx lr\r\n\r\n\r\n🤔"}],"action":{"name":"View Issue","url":"https://github.com/japaric/cortex-m-rtfm/issues/42#issuecomment-327036730"}}}

pftbest commented 7 years ago

@therealprof I think you forgot to remove the #[naked] attribute, that's why the generated code is incorrect. Naked functions can only have inline assembly inside, not the actual code.

Maybe we can do better, by having 2 functions behind a #[cfg(target, one for thumbv7+ that does b.w and one for thumbv6 that does a normal function call.

therealprof commented 7 years ago

@pftbest Hm, right again... this is becoming uncanny. ;)

However now we have the same function twice with two different labels:

0800031c <DH_TRAMPOLINE>:
 800031c:       f3ef 8008       mrs     r0, MSP
 8000320:       e7ff            b.n     8000322 <_ZN11cortex_m_rt15default_handler17hc8869ed4a288b164E>

08000322 <_ZN11cortex_m_rt15default_handler17hc8869ed4a288b164E>:
 8000322:       be00            bkpt    0x0000
 8000324:       e7fe            b.n     8000324 <_ZN11cortex_m_rt15default_handler17hc8869ed4a288b164E+0x2>

08000326 <BUS_FAULT>:
 8000326:       f3ef 8008       mrs     r0, MSP
 800032a:       e7fa            b.n     8000322 <_ZN11cortex_m_rt15default_handler17hc8869ed4a288b164E>

Duh, well.

Maybe we can do better, by having 2 functions behind a #[cfg(target, one for thumbv7+ that does b.w and one for thumbv6 that does a normal function call.

There's no reason for that. Both actually support the very same b.w instruction. It's just the compiler being wrong here and claiming that it wouldn't.

pftbest commented 7 years ago

I believe LLVM is correct in this case, quote from the docs:

ARMv6-M supports the Thumb instruction set, including a small number of 32-bit instructions introduced to the architecture as part of the Thumb-2 technology in ARMv6T2. ARMv6-M supports the 16-bit Thumb instructions from ARMv7-M, in addition to the 32-bit BL, DMB, DSB, ISB, MRS and MSR instructions.

b.w is a 32bit instruction and it's not on the list, so looks like it's not supported. (also it's not mentioned here)

However now we have the same function twice with two different labels:

Yes, DEFAULT_HANDLER gets inlined, that is unfortunate, but it may be fixed by this patches when they will be merged.

therealprof commented 7 years ago

b.w is a 32bit instruction and it's not on the list, so looks like it's not supported. (also it's not mentioned here)

Hm, I can't find the reference at the moment but some site said that b.w would be supported for Cortex-M0 as well but I guess you're right (again!).

Samonitari commented 7 years ago

Sorry for jumping in the discussion!

Actually the link @pftbest posted earlier clears this up perfecrly: See Table 12. B _label_ 's range is+- 16MB in case of 32-bit Thumb2, with the optional .B, or +-2KB with Thumb1 variant. Cortex-M0(+) only have BL, DMB, DSB, ISB, MRS, MSR from Thumb2, all other instructions have the Thumb1 variant, including B. Basically ~all 16bit T1 instruction has a corresponding T2 sibling, some with subtle differences like this.

therealprof commented 7 years ago

@Samonitari Right, however this wouldn't be the first time that the official documentation turns out to be incorrect. 😉

Really the only difference it makes is whether to report a bug to LLVM or not.

pftbest commented 7 years ago

@Samonitari Yes, thumb1 has b instrunction, but we need b.w which is 32bit T2 instruction. There is no way to encode such instruction on Cortex-M0. So there is no bug in LLVM.

japaric commented 7 years ago

@pftbest's idea, namely:

Maybe we can do better, by having 2 functions behind a #[cfg(target, one for thumbv7+ that does b.w and one for thumbv6 that does a normal function call.

Sounds good to me. I'd be happy to merge a PR implementing that.

jonas-schievink commented 7 years ago

I'm already working on that :)

jonas-schievink commented 7 years ago

This turned out to be pretty complicated as the mentioned #[cfg] would have to be put into every crate generated by svd2rust, so they all need a build.rs setting some armv6m cfg option since there's no other way to distinguish between v6 and v7.

jonas-schievink commented 7 years ago

I guess this is still fine, you just have to opt-in to get armv6 support...

jonas-schievink commented 7 years ago

Note that the issue isn't truly fixed until the stm32f030 crate is regenerated with an up-to-date svd2rust.

x37v commented 7 years ago

I figure this is worth a patch version update?

x37v commented 7 years ago

@jonas-schievink

the crate: https://gitlab.com/xnor/stm32f030/commit/2deff1fe6844da030c859c8f2b372a018f7d1ad4

example project: https://gitlab.com/xnor/stm32f0308-disco-rust/commit/cc0271624ad60bbf428f99a1f662da59394706b9

Builds and debugs in dev! :+1: Thanks All!

parched commented 6 years ago

Just stumbled onto this so I might of missed something, but about about just always using

ldr r0, =DEFAULT_HANDLER
bx  r0

then you have unlimited range.

pftbest commented 6 years ago

@parched but you will loose the value in r0 register. and you may want to know the value for debugging purposes.

rtic-rs / rtic

rtfm based stm32f030 debug build causes relcation truncated error #42