jetty840 / Sailfish-MightyBoardFirmware

Sailfish, faster than a Marlin
117 stars 76 forks source link

Build ends with linker warning producing not working binary :( #170

Closed TraxXavier closed 8 years ago

TraxXavier commented 8 years ago

When building the source with some added features I get a strange linker warning

build\mighty_one-2560\MightyBoard\shared\UART.o:(.ctors+0x0): warning: internal error: out of range error build\mighty_one-2560\MightyBoard\shared\VikiInterface.o:(.ctors+0x0): warning: internal error: out of range error

When that happens the resulting binary is unusable.

When I disable some original features like by adding '-HAS_RGB_LED' my features build fine. So its some compiler/linker issue, any ideas how to fix that?

I build using WinAVR 4.6.2 with avr-libc 1.7.2.

I forked https://github.com/jetty840/Sailfish-MightyBoardFirmware/commit/128cee9377c6f001f7da17e4742a1faf00cedc3e for my modifications as that is the source for the last public binary release.

Should I try to use a newer compiler version? Of so which one and if that causes any build issues as hinted in the docs how to fix these?

Cheers Trax

dnewman-polar3d commented 8 years ago

You have a linker issue with a some class function table crossing a boundary. Only way I know to fix it is to monkey around with link order of .o files. Basically it's a fight between you and the avr gcc toolchain. And if you use any gcc toolchain other than 4.6.2 or 4.6.3, then you may find yourself diagnosing issues in the stepper interrupt caused by changes in the compiler's optimizer and register allocation. We went through that a few years ago. The Marlin team fought with it two summers ago when they want from Arduino 1.x to 1.6 and the change from 4.3.2 to 4.8.3.

TraxXavier commented 8 years ago

We went through that a few years ago. And did you fix it? If so what revision should I use?

dnewman-polar3d commented 8 years ago

The 4.6.2 or 4.6.3 avr-gcc toolchain. And that won't fix your linker issue. Rather I was stating that if you are not useing 4.6.2 or 4.6.3 then you have additional issues to worry about above and beyond the virtual function table alignment issues.

TraxXavier commented 8 years ago

Ok, I understand. hmm... Is there any strait forward procedure to resolve the virtual function table alignment issues? I mean like, add this or that piece of code over and over again at position x or y and that will solve it in most cases? Something like that?

dnewman-polar3d commented 8 years ago

Not that I'm aware of. Again, we got around this twice by re-arranging the link order of some object files. But at the end of the day, you just don't want to add any more classes. Things got particularly bad when we had to abstract the LCD interface a little to add other types of displays -- displays with different communication protocols. That was another class with virtual functions and hence vtoc's.

TraxXavier commented 8 years ago

Ok, I see, so you are saying the issue is caused by the tool-chain not being able to handle the the currently used amount of classes. Hence for my particular case, where I only want to support one type of LCD I could undo the LCD abstraction to get some wiggle room for own stuff?

how to change the linking order I guess i have to modify one fo the scons related files, which one and how?

dnewman-polar3d commented 8 years ago

No, I'm not saying that. You have two issues:

  1. You are having a problem with VTOCs.
  2. You appear to be using an unsupported (by Sailfish) avr-gcc toolchain.

They are independent issues. You may resolve the VTOC issue with the current toolchain you are using. However, you may have other issues caused by using an unsupported toolchain. Namely, issues with the stepper interrupt performing poorly. You solve that issue by using avr-gcc 4.6.2 or 4.6.3 which is what the stepper interrupt in Sailfish is optimized for. Your VTOC issue you resolve through black magic. (Or removing some of the classes using virtual functions. The real killer is all the menus/screens which MakerBot originally implemented using abstract classes and hence virtual functions.)

TraxXavier commented 8 years ago

So you mean WinAVR 4.6.2 is not supported? Isn't that just a windows port of the Linux version? It has files like avr-gcc.exe, etc...

How about redoing the entire menu without virtual functions?

dnewman-polar3d commented 8 years ago

You're confusing the gcc-tool chain with WinAVR. Under the hood, WinAVR uses some avr-gcc toolchain. I myself do not use Windows. I do not know how the WinAVR version number maps to the avr-gcc toolchain version number. If WinAVR 4.6.2 uses avr-gcc 4.6.2 then you're okay on that count. If it uses some other avr-gcc version other than 4.6.2 or 4.6.3, then that's a problem.

TraxXavier commented 8 years ago

...>avr-gcc -v Using built-in specs. [...] Thread model: single gcc version 4.6.2 (GCC)

Well it looks to be the right one :)

But still there is the VTOCs issue... is the problem related explicitly to the amount of virtual functions or to the amount of classes or even objects in general I mean also structs, enums and namespaces?

dnewman-polar3d commented 8 years ago

All of the above. The linker's solution to the packing problem is causing a vtoc to be split over some boundary which it shouldn't be split. It's a push-me-pull-you problem. You can move code around and maybe it will go away or maybe it won't. You'll have to dive deep into how gcc and ld handle vtoc's to figure out if it's the total number of vtocs or just one particular vtoc or what. You might be able to use avr-nm on the .elf file and see. I don't know; not one of the areas I know much about.

TraxXavier commented 8 years ago

The Marlin team fought with it two summers ago when they want from Arduino 1.x to 1.6 and the change from 4.3.2 to 4.8.3.

Did they fix that, i mean other than going back to 4.6.2 or 4.6.3 ?

TraxXavier commented 8 years ago

Ok, I got the solution from a smart guy in the fingers-welt.de forum, apparently all that is needed to fix the linker error is to add the parameter "-mrelax" to the avr-gcc calls, like that (in SConscript.mightyboard):

env.Append(BUILDERS={'Elf':Builder(action=avr_tools_path+"/avr-gcc -mrelax -mmcu="+mcu+" -Os -Wl,--gc-sections -Wl,-Map,"+map_name+" -o $TARGET $SOURCES -lm")})

And that is it no more "warning: internal error: out of range error", and the binary's appear also to work, awesome!!!

The actual explanation: https://gcc.gnu.org/onlinedocs/gcc/AVR-Options.html (3.18.5.1 EIND and Devices with More Than 128 Ki Bytes of Flash)

dnewman-polar3d commented 8 years ago

On 25/08/2016 10:09 AM, TraxXavier wrote:

Ok, I got the solution from a smart guy in the fingers-welt.de forum, apparently all that is needed to fix the linker error is to add the parameter "-mrelax" to the avr-gcc calls, like that (in SConscript.mightyboard):

env.Append(BUILDERS={'Elf':Builder(action=avr_tools_path+"/avr-gcc -mrelax -mmcu="+mcu+" -Os -Wl,--gc-sections -Wl,-Map,"+map_name+" -o $TARGET $SOURCES -lm")})

And that is it no more "warning: internal error: out of range error", and the binary's appear also to work, awesome!!!

Well, do a lot of testing because we had a problem one time when we tried -mrelax. The problem may have been something else as we were wrestling with more than one issue at the time IIRC. But we shied away from -mrelax, perhaps for no good reason.

Dan

TraxXavier commented 8 years ago

Just to satisfy my curiosity, could you please elaborate on what issues with the stepper interrupt when using a unsupported toolchain?

dcnewman commented 8 years ago

On 28/08/2016 1:03 AM, TraxXavier wrote:

Just to satisfy my curiosity, could you please elaborate on what issues with the stepper interrupt when using a unsupported toolchain?

Do a lot of prints and listen carefully to the stepper motors and look carefully at the print quality. Makes visual differences when the compiler optimizations change in the stepper code. For instance, you may start hearing little clunks now and then: that's a missed stepper interrupt.

Dan

TraxXavier commented 8 years ago

Can this be detected algorithmically i.e. the Printer telling the user that it encountered an error, imho. human perception is to flawed ;)

dnewman-polar3d commented 8 years ago

On 28/08/2016 10:15 AM, Trax Xavier wrote:

Can this be detected algorithmically i.e. the Printer telling the user that it encountered an error, imho. human perception is to flawed ;)

Nope. It's all about jitter reduction. So you can externally measure the jitter by, for example, watching the STEP output for a given axis. And then watch for too much jitter as well as late/missed interrupt deliveries (which will appear as a sudden gap or possibly two bunched together STEPs). But then you have to take into account that there's acceleration/decel in the steps. And at really high step rates, there can be intentional double or quad stepping.

'tis why those of us who have done motion control for a while use our ears and eyes to listen for anomolies. (I'll also pull out a spectral analyzer at times to see what might be making obnoxious buzzes like the 13 KHz buz on DR8825s which drives me crazy. It's caused by their too-high-for 3DP minimum rise time in the current handling within the chip. Current can go too high and so you get some fun 13 KHz chopping happen as the driver then drops the current. Really annoying for those of us who can still hear that high.)