Open chris1seto opened 5 years ago
Hi Chris, could you attach your motor config? In particular I would be looking for a switching frequency set too high that is crashing the RTOS timing. It happened to me and its the main reason the watchdog has been reworked. More than 30khz is dangerous territory.
EDIT: Disregard, bad debugging info
Nevermind, disregard the above comment. This happens with stock settings on a brand new flash of the firmware when configured for 4.12
I flashed one of my palta boards with hw_410 here and I can't reproduce this issue.
Yes, I tried a full erase.
My hardware is both a Flipsky mini vesc and a torque vesc from esk8
Steps to repro:
git reset --hard; git pull origin master
Uncomment:
and comment the hardware60 lines
Full erase with STLink,
make upload
After this the board never boots up to the point where VCP works, as it is always rebooting.
Also, nothing connected externally, and the xtal point is interesting, but given the board call work with USB with the time out disabled, it must be ok (xtal required for USB)
On Tue, Apr 9, 2019, 2:54 PM Marcos Ariel Chaparro notifications@github.com wrote:
I flashed one of my palta boards with hw_410 here and I can't reproduce this issue.
- What do you mean by a brand new flash? Did you command a full chip erase from an stlink to ensure old configurations are erased?
- Are you using any app with the firmware?
- Are you using an encoder or other cpu load?
- Is your crystal okay? firmware now double checks the timing with an independent watchdog clock.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vedderb/bldc/issues/84#issuecomment-481410307, or mute the thread https://github.com/notifications/unsubscribe-auth/AAjpkCMuWNmW2hgUIhSWIiTsfEe0ZYaeks5vfO_ogaJpZM4clJJv .
Steps I'm doing:
qstlink2 --cli -e
Full flash memory eraseDo you know if there are other users with the same issue? Thanks
To my comment above add flashing the bootloader before step 7.
So, did not flash the bootloader, but it shouldn't make any difference, right? Looking through the code, it doesn't touch the wdg, (other than to abuse it to reset the board, lol). Anyone with a torque or flipsky esc who can test? This is looking like a hw issue, and it must be with the xtal.
Here's some more debug info. If I accidentally leave HW60 in as the selected config, USB works! So, what's the difference between 60 and 410 that affects this?
Edit: Also, I confirmed that both boards do have an xtal loaded, but I'm guessing everything must be ok on this front, because if the clock settings or xtal were incorrect, USB wouldn't work at all.
A significant difference is that hw6 defaults to FOC mode and hw4 defaults to bldc mode...
Maybe adding to hw410.h this could narrow this down:
// Default setting overrides
#ifndef MCCONF_DEFAULT_MOTOR_TYPE
#define MCCONF_DEFAULT_MOTOR_TYPE MOTOR_TYPE_FOC
#endif
Yup! That fixes it. So now...
So there is an issue starting BLDC mode with the timeout
More debugging info: This is absolutely related to the switching freq. I configured my motor, everything worked great, so I set 29.5K as my FOC switching freq (everything still worked great) and then I rebooted. After the reboot, the vesc now does the boot loop. I bet the reason it fails with bldc selected is because the switching freq is very high (35K) by default
Yes, I think you are right.
So its not a problem with the watchdog, the watchdog led you to discover that the CPU usage hit 100% with your default configuration and scheduler timing is failing.
In my palta hardware I added this limit a while ago to prevent exactly that
#define HW_LIM_FOC_CTRL_LOOP_FREQ 10000.0, 30000.0 //at around 38kHz the RTOS starts crashing (26us FOC ISR)
https://github.com/vedderb/bldc/blob/master/hwconf/hw_palta.h#L268
IMO a line like that should be added to all hardware versions.
I don't use BLDC mode, but a similar limit should be implemented for that mode.
#define MCCONF_M_BLDC_F_SW_MAX 35000 // Maximum switching frequency in bldc mode
Its either decrease the frequency or optimize the code to make it run faster. (I'd decrase freq)
The frequency limit depends on the CPU load. Looks like BLDC mode (or something else) is getting more cpu intensive and now the cpu can't keep up.
Now that we have a likely solution (or at least an explanation) I think we need @vedderb
Thanks for reporting!
And more debugging info... This goes beyond just the switching freq. If I get a good auto detection in FOC with hall/general, and then reboot, everything is fine. If I take those settings and back them up to a file, and then reload the file the VESC will boot loop. Even if I simply backup stock settings after a fresh erase/flash and restore them, the same thing happens.
And even more debugging info, If I do a fresh flash, load settings, not touch the motor config, but set the CAN baud to 1M and save, the vesc will bootloop on reboot
When you are near the cpu limit any configuration change can make it better or worse. An spi encoder will require more cpu usage, so would higher CAN packet decoding frequency.
Max frequency should be dialed down now, and then see how we are going to continue. Profiling and optimizing code is an endless endeavor once you hit your resources limit, I'd rather limit freq than making the code less clear.
@nitrousnrg Oops, I didn't see your previous message until now. That said, my configuration isn't really anything interesting. It's a totally stock config other than CAN being 1M, and FOC with a slightly higher switching freq in sensored mode. Seems a little unreasonable that this should be at the fully limits of the hardware/RTOS?
Memory resources are plentiful, but you can easily max out the cpu if you run the core control loop at high frequencies. Thats why my first question here was if you are running > 30kHz.
I just received a support ticket of a customer telling me that the latest firmware doesn't work for him in BLDC mode, so I would think this has escalated to be a critical bug that needs patching asap before more users upgrade the firmware and brick devices.
@nitrousnrg Just a note, I encountered this running at 20Khz (default FOC switching freq) too. It does not appear to only be dependent on switching freq. I don't know the codebase well enough to speculate on what might be going on, but it seems very sensitive to any kind of configuration changes.
Meh, customer installed a wrong resistor, totally unrelated. Too bad I emailed Benjamin about this.
I was following the conversation, but have not been home for a few days so I could not test anything myself. Emailing me is not a problem :-) When I come home I will catch up with the pull requests and issues.
If a commit from back then would break things for HW4 I suspect that I would have heard a lot more by now, so I was kind of hoping that you would resolve the issue.
@chris1seto is it ok to close this issue, or do you still have the problem? If you do, can you make sure that your compiler is working properly and that you did not disable optimizations?
Hi Benjamin,
That's my feeling too, is that you'd have heard more if this was really broken, but it seems like it really is (or at least, I'm not sure what could be wrong in my configuration). My compiler should be working correctly, I build other projects, and the optimization options should be set in the makefile, correct? I haven't changed the makefile or any part of the FW other than the general conf file (to target 410). I don't suppose anyone has an Esk8 Torque or flipsky mini vesc they could test on?
Do you have any potential steps to try to debug? I could send you a binary of stock FW to compare to one generated by your build system, but I suspect that if we have differing versions, the binary could change slightly.
EDIT: I am using gcc-arm-none-eabi-8-2018-q4-major
@chris1seto, did you get the chance to confirm its not a hardware issue? Can we close this issue?
Hi @nitrousnrg ,
It's definitely not a hardware issue. There's something else going on here in the bldc software, but I think Ben may need to look at it. Without disabling the watchdog, I cannot get the code to run on any of my 4.10 vescs. With the watchdog disabled the code seems to run fine, even if the scheduler is saturated.
Could you attach your motor config xml AND app xml? I can try your binary as well if you want.
If the scheduler is saturated it should not run fine, the board should reset, thats the purpose of using a wdt.
With your files I can probe this deeper, thanks!
Hi @nitrousnrg See attached!! These are for a 6" garden variety hoverboard motor. focworkingmini.zip
Thanks Chris, please send me your compiled binary, because with the latest firmware taken from https://github.com/vedderb/bldc/blob/master/build_all/410_o_411_o_412/VESC_default.bin your configs don't brick a discovery board.
Hi @nitrousnrg , see attached.fw.zip
chris@itxdev:~/Vesc1/bldc$ arm-none-eabi-gcc -v Using built-in specs. COLLECT_GCC=arm-none-eabi-gcc COLLECT_LTO_WRAPPER=/home/chris/opt/gcc-arm-none-eabi-8-2018-q4-major/bin/../lib /gcc/arm-none-eabi/8.2.1/lto-wrapper Target: arm-none-eabi Configured with: /tmp/jenkins/jenkins-GCC-8-build_toolchain_docker-51920181216 1544945247/src/gcc/configure --target=arm-none-eabi --prefix=/tmp/jenkins/jenkin s-GCC-8-build_toolchain_docker-519_20181216_1544945247/install-native --libexecd ir=/tmp/jenkins/jenkins-GCC-8-build_toolchain_docker-519_20181216_1544945247/ins tall-native/lib --infodir=/tmp/jenkins/jenkins-GCC-8-build_toolchaindocker-519 20181216_1544945247/install-native/share/doc/gcc-arm-none-eabi/info --mandir=/tm p/jenkins/jenkins-GCC-8-build_toolchain_docker-519_20181216_1544945247/install-n ative/share/doc/gcc-arm-none-eabi/man --htmldir=/tmp/jenkins/jenkins-GCC-8-build _toolchain_docker-519_20181216_1544945247/install-native/share/doc/gcc-arm-none- eabi/html --pdfdir=/tmp/jenkins/jenkins-GCC-8-build_toolchain_docker-519_2018121 6_1544945247/install-native/share/doc/gcc-arm-none-eabi/pdf --enable-languages=c ,c++ --enable-plugins --disable-decimal-float --disable-libffi --disable-libgomp --disable-libmudflap --disable-libquadmath --disable-libssp --disable-libstdcxx -pch --disable-nls --disable-shared --disable-threads --disable-tls --with-gnu-a s --with-gnu-ld --with-newlib --with-headers=yes --with-python-dir=share/gcc-arm -none-eabi --with-sysroot=/tmp/jenkins/jenkins-GCC-8-build_toolchaindocker-519 20181216_1544945247/install-native/arm-none-eabi --build=x86_64-linux-gnu --host =x86_64-linux-gnu --with-gmp=/tmp/jenkins/jenkins-GCC-8-build_toolchain_docker-5 19_20181216_1544945247/build-native/host-libs/usr --with-mpfr=/tmp/jenkins/jenki ns-GCC-8-build_toolchain_docker-519_20181216_1544945247/build-native/host-libs/u sr --with-mpc=/tmp/jenkins/jenkins-GCC-8-build_toolchain_docker-519_20181216_154 4945247/build-native/host-libs/usr --with-isl=/tmp/jenkins/jenkins-GCC-8-build_t oolchain_docker-519_20181216_1544945247/build-native/host-libs/usr --with-libelf =/tmp/jenkins/jenkins-GCC-8-build_toolchain_docker-519_20181216_1544945247/build -native/host-libs/usr --with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc+ +,-Bdynamic -lm' --with-pkgversion='GNU Tools for Arm Embedded Processors 8-2018 -q4-major' --with-multilib-list=rmprofile Thread model: single gcc version 8.2.1 20181213 (release) [gcc-8-branch revision 267074] (GNU Tools f or Arm Embedded Processors 8-2018-q4-major)
chris@itxdev:~/Vesc1/bldc$ git show -s --format=%H fb9442889ac1f4c6c3f1a6666f32a8a88a4a55e0 chris@itxdev:~/Vesc1/bldc$
chris@itxdev:~/Vesc1/bldc$ git diff diff --git a/conf_general.h b/conf_general.h index 61eed55..9f20ec4 100644 --- a/conf_general.h +++ b/conf_general.h @@ -61,14 +61,14 @@ //#define HW_SOURCE "hw_49.c" //#define HW_HEADER "hw_49.h"
-//#define HW_SOURCE "hw_410.c" // Also for 4.11 and 4.12 -//#define HW_HEADER "hw_410.h" // Also for 4.11 and 4.12 +#define HW_SOURCE "hw_410.c" // Also for 4.11 and 4.12 +#define HW_HEADER "hw_410.h" // Also for 4.11 and 4.12
// Benjamins first HW60 PCB with PB5 and PB6 swapped //#define HW60_VEDDER_FIRST_PCB
-#define HW_SOURCE "hw_60.c" -#define HW_HEADER "hw_60.h" +//#define HW_SOURCE "hw_60.c" +//#define HW_HEADER "hw_60.h"
//#define HW_SOURCE "hw_r2.c" //#define HW_HEADER "hw_r2.h"
Chris, your attached binary doesn't work in a discovery board, while mainstream binaries do work. Looks like a building issue.
Using built-in specs.
COLLECT_GCC=arm-none-eabi-gcc
COLLECT_LTO_WRAPPER=/usr/bin/../lib/gcc/arm-none-eabi/7.3.1/lto-wrapper
Target: arm-none-eabi
Configured with: /build/gcc-arm-none-eabi-2DWmz3/gcc-arm-none-eabi-7-2018q2/src/gcc/configure --target=arm-none-eabi --prefix=/build/gcc-arm-none-eabi-2DWmz3/gcc-arm-none-eabi-7-2018q2/install-native --libexecdir=/build/gcc-arm-none-eabi-2DWmz3/gcc-arm-none-eabi-7-2018q2/install-native/lib --infodir=/build/gcc-arm-none-eabi-2DWmz3/gcc-arm-none-eabi-7-2018q2/install-native/share/doc/gcc-arm-none-eabi/info --mandir=/build/gcc-arm-none-eabi-2DWmz3/gcc-arm-none-eabi-7-2018q2/install-native/share/doc/gcc-arm-none-eabi/man --htmldir=/build/gcc-arm-none-eabi-2DWmz3/gcc-arm-none-eabi-7-2018q2/install-native/share/doc/gcc-arm-none-eabi/html --pdfdir=/build/gcc-arm-none-eabi-2DWmz3/gcc-arm-none-eabi-7-2018q2/install-native/share/doc/gcc-arm-none-eabi/pdf --enable-languages=c,c++ --enable-plugins --disable-decimal-float --disable-libffi --disable-libgomp --disable-libmudflap --disable-libquadmath --disable-libssp --disable-libstdcxx-pch --disable-nls --disable-shared --disable-threads --disable-tls --with-gnu-as --with-gnu-ld --with-newlib --with-headers=yes --with-python-dir=share/gcc-arm-none-eabi --with-sysroot=/build/gcc-arm-none-eabi-2DWmz3/gcc-arm-none-eabi-7-2018q2/install-native/arm-none-eabi --with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm' --with-pkgversion='GNU Tools for Arm Embedded Processors 7-2018-q3-update' --with-multilib-list=rmprofile
Thread model: single
gcc version 7.3.1 20180622 (release) [ARM/embedded-7-branch revision 261907] (GNU Tools for Arm Embedded Processors 7-2018-q3-update)
My compiler version doesn't mention anything about jenkins and docker stuff
Where did you get your compiler package from? I got mine via the official tarball from here: https://developer.arm.com/tools-and-software/open-source-software/developer-tools/gnu-toolchain/gnu-rm/downloads (Linux x64)
Perhaps this is too much to ask, but would you mind downloading the tarball and using the prebuilt binaries within the build the source?
I agree that this certainly points to a build issue, and thus may not be a bug at this point, but I'm wondering what could be wrong here... I use this compiler for my fulltime day job as an STM32/Arm Cortex M3/M4F developer, so I would think that I notice if there was something wrong with my other projects. I'm more concerned about what's going on than anything...
Thanks!!
I followed the instructions here: https://vesc-project.com/node/310
sudo add-apt-repository ppa:team-gcc-arm-embedded/ppa
sudo apt update
sudo apt install gcc-arm-embedded
You can also check if the mainstream binary I used bricks your board.
I'll go ahead and try this tomorrow. I guess if I can build a successful binary using those directions we can go ahead and close the bug report. I am extremely curious as to why the tarball release generates a binary that fails in this way though. Perhaps some kind of difference in optimization?
I'm baffled as well, but at the same time, I'm not. The purpose of me pushing a motor simulator into vesc codebase is exactly this, to be able to automate tests on real hardware. If one day we bump the compiler version we could hit a problem like this and the test tools will catch the problem for us. In your pc it could be an environment variable issue, ir maybe the IDE you're using. I'd try an ubuntu virtual machine to be sure. Keep us posted!
I haven't had time to test this, but also I don't want to just keep this open since it's pretty clear this is some kind of bizarre build system issue. I guess we can go ahead and close it. Man, I'd really love to know where the difference is though. I'm not even sure how to debug this because I bet different versions of gcc will emit slightly different code, although I'm sure for 99.9999% of differences, it will be inconsequential. But my point is, I'm not sure how you could even diff the disassembly to pinpoint it.
I had a look, and the GCC version you are using is 8 whereas I have been using 7. That should be no problem, but I can give it a try with the same version you are using and see if I encounter the same problem. Will report back in a few days after testing.
Thanks Benjamin! That would be excellent!
I happen to also have a 4.10 Flipsky around so I tested the latest firmware on it.
(chEvtWaitAnyTimeout(ALL_EVENTS, MS2ST(10)) == 0) {
I have tried reducing 10ms to 1ms or 100us but still get the board reset. If I change it to just continue, it behaves fine. Do you see that too?
FWIW I can also reproduce this on a 4.12 VESC. I was able to bisect it to the same commit. I'm using GCC 9.2.1 from Fedora's repositories. I also tried @Guillaume227 's suggestion of always continue, however that was an incomplete fix - it gets farther, but USB never comes up.
I just rebuilt the code with gcc-arm-none-eabi-7-2018-q2-update and now it works perfectly. So it is, in fact, the gcc version that matters.
Had the same issue and can confirm, current master works when compiled with gcc-arm-none-eabi-7-2018-q2
- but will boot loop when compiled with gcc-arm-none-eabi-9-2019-q4
.
Starting in 17f97763c0f32ad38001629850d2a606f3679f70, when this firmware is configured for hard 4.12, the board simply reboots on bootup in a loop.