All Qt apps are silently built inoperable when compiled with clang using LTO ("signal not found")

v-fox commented 4 years ago


Bugzilla Link	46469
Version	11.0
OS	Linux
CC	@dwblaikie,@DougGregor,@dommldomml,@MaskRay,@LebedevRI,@zygoloid

Extended Description

I've noticed that all packages that use Qt and that I tried to build with clang using LTO were unable to draw their GUI at all or got stuck after creating a window. Apparently, it's a known old… "feature" for which I failed to find a report here.

https://www.reddit.com/r/cpp_questions/comments/82jpz5/qt5_signal_broken_by_lto/ https://bugreports.qt.io/browse/QTBUG-43556 https://bugreports.qt.io/browse/QTBUG-61710 https://github.com/InBetweenNames/gentooLTO/issues/444

So, Qt guys refuse to change anything and say that their code is correct and allege that clang guys say the same. However, building broken binaries without erroring-out is definitively not OK. If Qt code is truly correct it would be nice for clang to avoid misoptimising it or, if clang code is correct, detect such usage and skip optimising it. Or, again, at least erroring-out. What if entire distribution would be built with clang with LTO by default ?

llvmbot commented 3 years ago

Not sure if Fangrui saw these updates, so I responded on the commit message for the most recent change pointing him here.

a5fec13e-39c8-49b2-b3dd-e7d891b4dfc7 commented 3 years ago

Hi Teresa, Fangrui, you made my day. Looks like we can switch to Clang/LTO for production use RSN.

Is there a chance to backport these fixes into 11.0.x ?

Thanks

Dominik

llvmbot commented 3 years ago

Woops, hit save accidentally.

The seg fault turns into correct output after this change, also by Fangrui:

687b83ceabafe81970cd4639e7f0c89036402081 Author: Fangrui Song i@MaskRay.me Date: Sat Dec 5 23:13:28 2020

[X86FastISel] Fix MO_GOTPCREL GlobalValue reference in static relocation model

This fixes the bug referenced by 5582a7987662a92eda5d883b88fc4586e755acf5 which was exposed by 961f31d8ad14c66829991522d73e14b5a96ff6d4.

With this change, movq src@GOTPCREL, %rcx => movq src@GOTPCREL(%rip), %rcx

llvm/lib/Target/X86/X86FastISel.cpp llvm/test/CodeGen/X86/fast-isel-mem.ll

What isn't clear to me whether it is understood why this fix will not only fix the issue exposed by 961f31d8ad14c66829991522d73e14b5a96ff6d4 (intended to be NFC), but apparently by some earlier bug which in this case was causing incorrect output.

llvmbot commented 3 years ago

It looks like this might have already been fixed by Fangrui, who is already cc'ed on this bug. Hoping Fangrui can confirm how his changed fixed the problem.

I used the reproduction instructions in Comments 12 and 13, and happened to notice that it was working in a recent client and broken in an older one. I did some bisection and found that the incorrect output changed to an immediate seg fault with 961f31d8ad14c66829991522d73e14b5a96ff6d4 (TargetMachine] Don't imply dso_local on global variable declarations in Reloc::Static model).

dwblaikie commented 3 years ago

Hmm, it looks like Clang/LLVM should be able to respect -fPIC specified at compile time when doing LTO. Clang/LLVM creates an llvm::Module that retains the pic information:

$ diff <(clang a.c -fPIC -c -emit-llvm -S -o -) <(clang a.c -c -emit-llvm -S -o -) 7c7 < define void @f1() #0 {

define dso_local void @f1() #0 { 13,14c13,14 < !llvm.module.flags = !{#0, !1} < !llvm.ident = !{#2}

!llvm.module.flags = !{#0} !llvm.ident = !{#1} 17,18c17 < !1 = !{i32 7, !"PIC Level", i32 2} < !2 = !{!"clang version 12.0.0 (git@github.com:llvm/llvm-project.git dcefeeae9896c31a593c9eb64b31fd2f7f53b696)"}

!1 = !{!"clang version 12.0.0 (git@github.com:llvm/llvm-project.git dcefeeae9896c31a593c9eb64b31fd2f7f53b696)"}

But I really don't know a great deal about -fPIC myself, so wouldn't know how to write a small proof of concept to demonstrate how -fPIC might be correctly or incorrectly handled, and in what way Qt might be depending on it (& in what way LLVM might be implementing it, perhaps partially/in a way that makes sense in most other cases but not for Qt).

a5fec13e-39c8-49b2-b3dd-e7d891b4dfc7 commented 3 years ago

This is a comment by a Qt engineer I found in one of the Qt bugs referenced above:

Not our bug. Your combination of compiler and linker produced copy relocations.

Please ensure that your application was compiled with -fPIC. Since you're doing > link-time optimisation, that flag must appear in the linker command-line too.

So this supports my theory in my last comment.

a5fec13e-39c8-49b2-b3dd-e7d891b4dfc7 commented 3 years ago

To break the silence:

IIRC, Qt mandates that all source files are compiled with -fPIC. It even checks so by querying the compile parameters.

To me it looks like the LTO generated objects are not PIC - is this correct ?

a5fec13e-39c8-49b2-b3dd-e7d891b4dfc7 commented 3 years ago

Smaller reproduction

a5fec13e-39c8-49b2-b3dd-e7d891b4dfc7 commented 3 years ago

As I wrote in a previous comment, the sole functionality which is used inside the Qt shared lib is what I copied into rest.cpp. But when comiling rest.cpp into a shared object - even with gcc, the issue is not reproduced.

Not sure I follow - you're saying, so far as you know, the 'rest.cpp' could be built into a shared library and then linked against the lto.cpp and that runs, but doesn't reproduce the bug, but you're not sure why?

That might be worth figuring out - bisecting/narrowing down what you need to pull out of the Qt shared library until rest.cpp is actually standalone/everything needed to reproduce? This is the way in which lto.cpp was created. Then I pulled together what was left in Qt and created rest.cpp. Ripping apart libQt5Widgets.so is extremely tedious as this would require intercepting Qt's build process.

When you talk about the disassembled code, I know how tu use "-S" but I assume that this won't help in an LTO context.

There is a way to dump assembly and other intermediate (like LLVM IR) out from ThinLTO. Maybe something like -Wl,-save-temps maybe? ( https://reviews.llvm.org/D45217 ) - oh, looks like it might just be -save-temps, without the -Wl. There must be a fundamental difference how lto.cpp is compiled with/without -flto. I can extract the assembly, but I am not sure whether I'll be able to tell the difference. Maybe I'll remove all output and just return 0 or 1 from main() as this would minimize the amount of code.

Yep, reducing it down to an exit 0/1 would probably help - io (epsecially C++ io) can complicate things a great deal. I have attached a new test case which only returns 0/1. The symptoms remain the same. LTO'd code returns "1" wheras nonLTO'd returns "0".

I have attached the .bc file (of course with LTO) as well as the two assemblies with (lto_lto.s) and without (lto.s) LTO.

I have verified that .bc -> link and .bc -> llc -> .s -> link show the same symptom as compiling directly from C++.

I am afraid, I am now at the end of what I can provide, unless you have proposals for more experiments I can do.

dwblaikie commented 3 years ago

As I wrote in a previous comment, the sole functionality which is used inside the Qt shared lib is what I copied into rest.cpp. But when comiling rest.cpp into a shared object - even with gcc, the issue is not reproduced.

Not sure I follow - you're saying, so far as you know, the 'rest.cpp' could be built into a shared library and then linked against the lto.cpp and that runs, but doesn't reproduce the bug, but you're not sure why?

That might be worth figuring out - bisecting/narrowing down what you need to pull out of the Qt shared library until rest.cpp is actually standalone/everything needed to reproduce? This is the way in which lto.cpp was created. Then I pulled together what was left in Qt and created rest.cpp. Ripping apart libQt5Widgets.so is extremely tedious as this would require intercepting Qt's build process.

When you talk about the disassembled code, I know how tu use "-S" but I assume that this won't help in an LTO context.

There is a way to dump assembly and other intermediate (like LLVM IR) out from ThinLTO. Maybe something like -Wl,-save-temps maybe? ( https://reviews.llvm.org/D45217 ) - oh, looks like it might just be -save-temps, without the -Wl. There must be a fundamental difference how lto.cpp is compiled with/without -flto. I can extract the assembly, but I am not sure whether I'll be able to tell the difference. Maybe I'll remove all output and just return 0 or 1 from main() as this would minimize the amount of code.

Yep, reducing it down to an exit 0/1 would probably help - io (epsecially C++ io) can complicate things a great deal.

I now tried producing the two .s files, the LTO one I compiled via llc from the .bc file.

As it must be about the "clicked" ptmf address the difference between the 2 assemblies is: movq _ZN15QAbstractButton7clickedEb@GOTPCREL(%rip), %rcx in the non-LTO case vs. movq $_ZN15QAbstractButton7clickedEb, -32(%rbp) in the LTO case.

I guess there are other, less interesting assembly differences? (the -32(%rbp) is quite different to %rcx - so figure there must be other assembly differences that read or write to those two locations)

but, yes, that sounds plausibly importantly different

I have no idea whether these represent the same, and I am by far no expert in x86 assembly. The naive change to movq _ZN15QAbstractButton7clickedEb@GOTPCREL(%rip), -32(%rbp) does not work syntacticlly: lto_lto.s:151: Error: too many memory references for `movq'

a5fec13e-39c8-49b2-b3dd-e7d891b4dfc7 commented 3 years ago

As I wrote in a previous comment, the sole functionality which is used inside the Qt shared lib is what I copied into rest.cpp. But when comiling rest.cpp into a shared object - even with gcc, the issue is not reproduced.

Not sure I follow - you're saying, so far as you know, the 'rest.cpp' could be built into a shared library and then linked against the lto.cpp and that runs, but doesn't reproduce the bug, but you're not sure why?

That might be worth figuring out - bisecting/narrowing down what you need to pull out of the Qt shared library until rest.cpp is actually standalone/everything needed to reproduce? This is the way in which lto.cpp was created. Then I pulled together what was left in Qt and created rest.cpp. Ripping apart libQt5Widgets.so is extremely tedious as this would require intercepting Qt's build process.

When you talk about the disassembled code, I know how tu use "-S" but I assume that this won't help in an LTO context.

There is a way to dump assembly and other intermediate (like LLVM IR) out from ThinLTO. Maybe something like -Wl,-save-temps maybe? ( https://reviews.llvm.org/D45217 ) - oh, looks like it might just be -save-temps, without the -Wl. There must be a fundamental difference how lto.cpp is compiled with/without -flto. I can extract the assembly, but I am not sure whether I'll be able to tell the difference. Maybe I'll remove all output and just return 0 or 1 from main() as this would minimize the amount of code.

I now tried producing the two .s files, the LTO one I compiled via llc from the .bc file.

As it must be about the "clicked" ptmf address the difference between the 2 assemblies is: movq _ZN15QAbstractButton7clickedEb@GOTPCREL(%rip), %rcx in the non-LTO case vs. movq $_ZN15QAbstractButton7clickedEb, -32(%rbp) in the LTO case.

I have no idea whether these represent the same, and I am by far no expert in x86 assembly. The naive change to movq _ZN15QAbstractButton7clickedEb@GOTPCREL(%rip), -32(%rbp) does not work syntacticlly: lto_lto.s:151: Error: too many memory references for `movq'

dwblaikie commented 3 years ago

As I wrote in a previous comment, the sole functionality which is used inside the Qt shared lib is what I copied into rest.cpp. But when comiling rest.cpp into a shared object - even with gcc, the issue is not reproduced.

Not sure I follow - you're saying, so far as you know, the 'rest.cpp' could be built into a shared library and then linked against the lto.cpp and that runs, but doesn't reproduce the bug, but you're not sure why?

That might be worth figuring out - bisecting/narrowing down what you need to pull out of the Qt shared library until rest.cpp is actually standalone/everything needed to reproduce?

When you talk about the disassembled code, I know how tu use "-S" but I assume that this won't help in an LTO context.

There is a way to dump assembly and other intermediate (like LLVM IR) out from ThinLTO. Maybe something like -Wl,-save-temps maybe? ( https://reviews.llvm.org/D45217 ) - oh, looks like it might just be -save-temps, without the -Wl.

a5fec13e-39c8-49b2-b3dd-e7d891b4dfc7 commented 3 years ago

Steps to reproduce: download libQt5Widgets.so from here: https://drive.google.com/file/d/1pQ4Ukc6Ynr25POQrcnm7Py3Kt_zJEgV-/view?usp=sharing

clang++ -o out.lto -fuse-ld=lld -flto=thin -fPIC lto.cpp -L . -lQt5Widgets clang++ -o out -fuse-ld=lld -fPIC lto.cpp -L . -lQt5Widgets

./out gives: addr: 0x7ffe6436b1f0 signal_index 0x7ffe6436b1f0 2

whereas ./out.lto gives: addr: 0x7fff40cc43f0 signal_index 0x7fff40cc43f0 -1 signal_index 0x7fff40cc43f0 -1 signal_index 0x7fff40cc43f0 -1 not found

If you want to mimick libQt5Widgets: g++ -o libTest.so -shared -fPIC rest.cpp clang++ -o out.libtest -fuse-ld=lld -fPIC lto.cpp -L . -lTest clang++ -o out.libtest.lto -flto=thin -fuse-ld=lld -fPIC lto.cpp -L . -lTest

These two binaries return the same (modulus the addresses): addr: 0x7ffc1679f190 addr2: 0x7ffc1679f0a0 1 QSM 1 1 signal_index 0x7ffc1679f190 2

(rest.cpp gives some more output) So when linked to libTest.so, compiling with or without LTO does not play a role, whereas it does when linking to libQt5Widgets.so. Using "-fsemantic-interposition" in clang++ does not make a difference.

Is this enough as explanation ?

LebedevRI commented 3 years ago

As I wrote in a previous comment, the sole functionality which is used inside the Qt shared lib is what I copied into rest.cpp. But when comiling rest.cpp into a shared object - even with gcc, the issue is not reproduced. Can you state the reproduction steps given that lto.tgz?

When you talk about the disassembled code, I know how tu use "-S" but I assume that this won't help in an LTO context.

How shall we continue ?

a5fec13e-39c8-49b2-b3dd-e7d891b4dfc7 commented 3 years ago

As I wrote in a previous comment, the sole functionality which is used inside the Qt shared lib is what I copied into rest.cpp. But when comiling rest.cpp into a shared object - even with gcc, the issue is not reproduced.

When you talk about the disassembled code, I know how tu use "-S" but I assume that this won't help in an LTO context.

How shall we continue ?

dwblaikie commented 3 years ago

Hmm, sorry to hear about the semantic interposition stuff.

I think, at least for myself, the test case is still a bit too big/opaque for me - I don't know what features the Qt library is depending on/LLVM's lto binary is failing to provide.

My only guesses/suggestions from here would be to either inquire with the Qt folks who have already said they know the problem/that it's LLVM's problem if maybe they could provide a small standalone example of the behavior they're relying on. Alternatively, I guess maybe trying to extract the assembly files from the LTO and non-LTO build, a non-LTO user of the library (to keep as much of it as separate as possible) if that's possible - then showing the smallest change to the assembly that trips the change in behavior. At least then we might understand what's changing in LTO in a significant way. I guess another direction to look at this might be through dumping the LTO and non-LTO code, and trying various ways to get them closer together to pinpoint the critical difference.

a5fec13e-39c8-49b2-b3dd-e7d891b4dfc7 commented 3 years ago

@David: I understood correctly that I should leave the GCC compiled stuff as-is and commpile only the clang/lto part with "-fsemantic-interposition" - right ?

So what I tried is: clang++ -v -g -o out.lto -fsemantic-interposition -fuse-ld=lld -flto=thin -fPIC lto.cpp -L . -lQt5Widgets

But the effect is the same. If compiled w/o LTO the ptmf address is found whereas it isn't when compild with LTO.

I am eager to do more experiments but I am out of ideas here. I would really love to have this issue fixed because I cannot introduce CLANG/LTO for my project and do not get the nice performance boost from it. I also think that there are a lot of Qt users out there which would benefit from a fix for this issue.

I hope I have done everything so that the problem is reproducable for the LLVM/Clang community. If you need more data/experiments/..., just tell me and I will provide everything you request.

a5fec13e-39c8-49b2-b3dd-e7d891b4dfc7 commented 3 years ago

@David: I understood correctly that I should leave the GCC compiled stuff as-is and commpile only the clang/lto part with "-fsemantic-interposition" - right ?

So what I tried is: clang++ -v -g -o out.lto -fsemantic-interposition -fuse-ld=lld -flto=thin -fPIC lto.cpp -L . -lQt5Widgets

But the effect is the same. If compiled w/o LTO the ptmf address is found whereas it isn't when compild with LTO.

I am eager to do more experiments but I am out of ideas here. I would really love to have this issue fixed because I cannot introduce CLANG/LTO for my project and do not get the nice performance boost from it. I also think that there are a lot of Qt users out there which would benefit from a fix for this issue.

I hope I have done everything so that the probkem is reproducable for the LLVM/Clang community. If you need more data, just tell me and I will provide everything you request.

a5fec13e-39c8-49b2-b3dd-e7d891b4dfc7 commented 3 years ago

I tried llvm/clang 11.0rc3 with -fsemantic-interposition, but it does not change anything.

a5fec13e-39c8-49b2-b3dd-e7d891b4dfc7 commented 4 years ago

Hmmm. clang 10 does not seem to support -fsemantic-interposition.

Setting -fno-semantic-interposition for the g++ compilation of my small reproduction did not introduce the breakage.

Only clang 11 supports -fsemantic-interposition. I'll try to build it.

dwblaikie commented 4 years ago

OK I only have a vague idea of this sort of thing. But I think it /might/ be fsemantic-interposition.

Could you try building with -fsemantic-interposition & see if that helps? (some discussion here: https://reviews.llvm.org/D72829 )

And/or try with GCC with -fno-semantic-interposition and see if it reproduces some of/the same breakage.

a5fec13e-39c8-49b2-b3dd-e7d891b4dfc7 commented 4 years ago

Is the reproduction I provided sufficient ? Or do I need to dig deeper. I am a bit out of ideas on how to replace the .so file provided by sth. which can be compiled as I explained in my previous comments. As soon as I try to compile the missing pieces myself, the problem goes away. I have reduced the test case a little further, but not substantially. DO you want me to provide the even more reduced test source ?

I desparately need a fix for my project as I cannot switch on LTO because of this issue, and I would love to see the performance benefits of LTO.

a5fec13e-39c8-49b2-b3dd-e7d891b4dfc7 commented 4 years ago

The issue muszt comme from the "QAbstractButton::clicked" pointer to member function. Basically the funcion "QAbstractButton::qt_static_metacall" which is called fom "connectImpl1" checks whther the given ptmf is the same as the expected one (see rest.cpp). This comparison fails in the LTO case. I do not understand why the address should differ for the LTO case - and it does not if I compile the function myself.

a5fec13e-39c8-49b2-b3dd-e7d891b4dfc7 commented 4 years ago

Reproduction sources

a5fec13e-39c8-49b2-b3dd-e7d891b4dfc7 commented 4 years ago

I have reduced the issue to a single C++ file, it is not 100% easy, though.

When compiling this C++ file with/without LTO it shows the exact wrong behavior which has been observed by the reporter.

However if only shows the problem when linked against the Qt shared lib. When I try to add the missinng pieces in a small reproduction, the problem does not occur.

I'll attach the follwing: lro.cpp lto.h - the files which trigger the erroneous behavior. rest.cpp - the remaining pieces from Qt which are needed to build an executable

libQt5Widgets.so is too big too attach, It can be downloaded from here: https://drive.google.com/file/d/1pQ4Ukc6Ynr25POQrcnm7Py3Kt_zJEgV-/view?usp=sharing

Steps to reproduce: clang++ -g -o out -fuse-ld=lld -fPIC lto.cpp -L . -lQt5Widgets && LD_LIBRARY_PATH=. ./out prints: signal_index -1 signal_index 2 clang++ -g -o out.lto -fuse-ld=lld -flto=thin -fPIC lto.cpp -L . -lQt5Widgets && LD_LIBRARY_PATH=. ./out.lto prints: signal_index -1 signal_index -1 signal_index -1 signal_index -1 not found

dwblaikie commented 4 years ago

Are you familiar with/able to write C++ code? If so, you might be able to reduce the test case down, even if you're not especially familiar with what's correct or not.

Otherwise maybe you can find some folks you support/who depend on your work who might have more familiarity with this sort of thing to help out.

Nope, sorry. And if C/Python/Lua code is mostly self-explanatory, the more I see object-oriented stuff, the more I hate it.

That sort of language probably isn't necessary here. (it's totally fine if you prefer/choose not to work with it - but can leave out the emotive language)

What language would that be ? Are you sure you've read it right ? That's some unprovoked insinuations that I take as offence.

Just the use of the word "hate" with regards to programming languages/idioms (& specifically the one this project is written in). Doesn't seem to me like it adds a lot to the conversation and maybe makes it more difficult.

But I'm not sure why you need my examples for it. I've linked several test applications and several real applications that 100% reproduce it on launch. And you can take any Qt app and break it by compiling with LTO.

Because isolating the particular problem takes time I don't have available to/choose to spend investigating this issue. Perhaps you or someone else does have that time/willingness/priority.

Having an isolated (the smallest, standalone example - no library (standard or otherwise), probably just one or two source files, etc) example of the functionality is, at least to me, the first step needed in discussing whether the functionality is acceptable to implement in LTO, perhaps debate whether there are other approaches that could provide the same results, etc.

Linked discussions with Qt authors implied that they know exact cause, so they could do so. You may quickly get an answer that's to your satisfaction from someone with your level of knowledge on the matter instead of a messenger.

Yeah, maybe - but the threads you pointed to seemed like people asked for examples and they didn't want to provide them, seemed to assume it was "obvious" what the issue was, where it wasn't obvious to me at least. The tone/content there didn't lead me to believe it would be an easy/quick/simple conversation.

Does that happen with lld only, or with ld.gold too?

Both, and I haven't found any either linker or clang flags to alleviate that.

If your build system supports some way to provide custom flags to certain compilations, you might be able to opt-out of lto for certain files. Though I don't know how pervasive the use of Qt is & thus how maintainable it would be to maintain those opt-out flags. I guess if Qt use is restricted to a certain wrapper/UI library, maybe just don't compile that library with LTO - but you can compile the business logic/implementation of the programs of interest with LTO.

I'm using SUSE's Open Build Service which uses gcc with LTO by default now. Some packages can opt-in for clang and flags can be changed. But the fact is that clang cannot be drop-in simple replacement for gcc and it cannot be distro-wide default compiler due to this. The main reason to selectively use clang right now is thinLTO option for big application that take forever to compile otherwise. But with prevalence if LLVM-specific use-cases (such as OpenCL and Vulkan implementation, JIT-compilation stuff such as what Mesa and RPCS3 project do) it would be logical to shift to it. But no distro would allow KDE, its primary desktop environment, and all Qt GUI apps (which is like half or more of all F/OSS GUI apps and pretty much all cross-platform GUI apps) to just break.

I'm not sure that you quite realize the scale of the problem: Qt is the main toolkit of all cross-platform Linux & Windows & Mac GUI apps. The reason why it's not yet a massive OS-breaking disaster is that clang was not yet adopted as primary compiler and many automated build-systems skim on hardware resources for globally enabling LTO.

I think maybe the piece missing here about prioritization is that most of LLVM's contributors (including myself) are paid by various companies to implement/support functionality for their needs. We do our best to make the software generally usable and having multiple invested parties helps keep the overall thing fairly general purpose - but not many of us (not me) have "make this the default compiler on a linux distro" as a priority or goal.

So while I'm sure it's a major issue for that sort of adoption, as you say, that doesn't make it a priority for me (& many other LLVM developers) compared to other uses/features of the project.

I have no idea about development and maintenance of LLVM but I assumed that such large and core project would have at least some kind of a freedesktop.org-like structure with committee of volunteered and company-assigned representatives to manage roadmap, writing guidelines, coordinating development of common features and maintaining infrastructure.

Not much like that, no. There's the LLVM Foundation board, which mostly handles use of donated funds from large corporations - to pool those resources to run our annual developer conferences and do various outreach - sometimes figure out large infrastructure questions like paid servers/hosting, etc.

Other than that, we mostly all just work on whatever features we need - try to hold each other accountable to some common infrastructure quality. ("hey, don't add another one of those, it's time to refactor that so they all fit in better", etc). It's by no means a perfect system, but seems to work for us.

But then, again, there would, probably, be no forks, like AMD's ROCm and Intel's NEO, if it were that coordinated.

Forks will happen, especially with internal priorities that diverge significantly from the overall open source goals - vendors testing new silicon/hardware directions will fork privately to implement that support before the silicon is released, then maybe it's too expensive to upstream the work after the fact, or they have other priorities, etc.

(I suppose one feature that tends to have a rather binary effect on forking - discouraging it to a degree, but making it fairly hard to converge again if someone wants to - is the project's API is fairly volatile, so if you want to share in the benefits of new upstream feature development, you've got to stay pretty close to it or it'll be a pain to try to catch up)

Isn't macOS built fully with LLVM ?

The MacOS system compiler is based on LLVM, yeah.

That would likely also break VLC on it.

If built with LTO, I suppose? VLC appears to make MacOS distributions - wonder if they build them with LTO, and if so, how?

And all BSD distros that use it too.

Yeah, there's some FreeBSD folks in the LLVM community - not sure if they're using LTO by default or have tripped over this issue. They might have more incentive to try to investigate the issue.

PS: this Bugzilla broke with database errors when I wanted to post this originally.

Bother :/ there is some discussion of moving to GitHub issues so we aren't trying to host our own solution here. Hopefully in the not too distant future.

That's nice. Although, own GitLab instances or hosting at gitlab.com instance seem even superior to GitHub nowadays. freedesktop.org has actually recently migrated to one. Looks best and may help with getting rid of reliance on mailing lists too and separated documentation, to have it all self-contained, nicely tagged and cross-linked.

Perhaps we'll get there, though there seems to be some benefit to being on github where lots of other projects are - visibility/familiarity for newcomers, etc.

v-fox commented 4 years ago

Are you familiar with/able to write C++ code? If so, you might be able to reduce the test case down, even if you're not especially familiar with what's correct or not.

Otherwise maybe you can find some folks you support/who depend on your work who might have more familiarity with this sort of thing to help out.

Nope, sorry. And if C/Python/Lua code is mostly self-explanatory, the more I see object-oriented stuff, the more I hate it.

That sort of language probably isn't necessary here. (it's totally fine if you prefer/choose not to work with it - but can leave out the emotive language)

What language would that be ? Are you sure you've read it right ? That's some unprovoked insinuations that I take as offence.

But I'm not sure why you need my examples for it. I've linked several test applications and several real applications that 100% reproduce it on launch. And you can take any Qt app and break it by compiling with LTO.

Because isolating the particular problem takes time I don't have available to/choose to spend investigating this issue. Perhaps you or someone else does have that time/willingness/priority.

Having an isolated (the smallest, standalone example - no library (standard or otherwise), probably just one or two source files, etc) example of the functionality is, at least to me, the first step needed in discussing whether the functionality is acceptable to implement in LTO, perhaps debate whether there are other approaches that could provide the same results, etc.

Linked discussions with Qt authors implied that they know exact cause, so they could do so. You may quickly get an answer that's to your satisfaction from someone with your level of knowledge on the matter instead of a messenger.

Does that happen with lld only, or with ld.gold too?

Both, and I haven't found any either linker or clang flags to alleviate that.

If your build system supports some way to provide custom flags to certain compilations, you might be able to opt-out of lto for certain files. Though I don't know how pervasive the use of Qt is & thus how maintainable it would be to maintain those opt-out flags. I guess if Qt use is restricted to a certain wrapper/UI library, maybe just don't compile that library with LTO - but you can compile the business logic/implementation of the programs of interest with LTO.

I'm using SUSE's Open Build Service which uses gcc with LTO by default now. Some packages can opt-in for clang and flags can be changed. But the fact is that clang cannot be drop-in simple replacement for gcc and it cannot be distro-wide default compiler due to this. The main reason to selectively use clang right now is thinLTO option for big application that take forever to compile otherwise. But with prevalence if LLVM-specific use-cases (such as OpenCL and Vulkan implementation, JIT-compilation stuff such as what Mesa and RPCS3 project do) it would be logical to shift to it. But no distro would allow KDE, its primary desktop environment, and all Qt GUI apps (which is like half or more of all F/OSS GUI apps and pretty much all cross-platform GUI apps) to just break.

I'm not sure that you quite realize the scale of the problem: Qt is the main toolkit of all cross-platform Linux & Windows & Mac GUI apps. The reason why it's not yet a massive OS-breaking disaster is that clang was not yet adopted as primary compiler and many automated build-systems skim on hardware resources for globally enabling LTO.

I think maybe the piece missing here about prioritization is that most of LLVM's contributors (including myself) are paid by various companies to implement/support functionality for their needs. We do our best to make the software generally usable and having multiple invested parties helps keep the overall thing fairly general purpose - but not many of us (not me) have "make this the default compiler on a linux distro" as a priority or goal.

So while I'm sure it's a major issue for that sort of adoption, as you say, that doesn't make it a priority for me (& many other LLVM developers) compared to other uses/features of the project.

I have no idea about development and maintenance of LLVM but I assumed that such large and core project would have at least some kind of a freedesktop.org-like structure with committee of volunteered and company-assigned representatives to manage roadmap, writing guidelines, coordinating development of common features and maintaining infrastructure. But then, again, there would, probably, be no forks, like AMD's ROCm and Intel's NEO, if it were that coordinated.

Isn't macOS built fully with LLVM ? That would likely also break VLC on it. And all BSD distros that use it too.

PS: this Bugzilla broke with database errors when I wanted to post this originally.

Bother :/ there is some discussion of moving to GitHub issues so we aren't trying to host our own solution here. Hopefully in the not too distant future.

That's nice. Although, own GitLab instances or hosting at gitlab.com instance seem even superior to GitHub nowadays. freedesktop.org has actually recently migrated to one. Looks best and may help with getting rid of reliance on mailing lists too and separated documentation, to have it all self-contained, nicely tagged and cross-linked.

dwblaikie commented 4 years ago

Are you familiar with/able to write C++ code? If so, you might be able to reduce the test case down, even if you're not especially familiar with what's correct or not.

Otherwise maybe you can find some folks you support/who depend on your work who might have more familiarity with this sort of thing to help out.

Nope, sorry. And if C/Python/Lua code is mostly self-explanatory, the more I see object-oriented stuff, the more I hate it.

That sort of language probably isn't necessary here. (it's totally fine if you prefer/choose not to work with it - but can leave out the emotive language)

But I'm not sure why you need my examples for it. I've linked several test applications and several real applications that 100% reproduce it on launch. And you can take any Qt app and break it by compiling with LTO.

Because isolating the particular problem takes time I don't have available to/choose to spend investigating this issue. Perhaps you or someone else does have that time/willingness/priority.

Having an isolated (the smallest, standalone example - no library (standard or otherwise), probably just one or two source files, etc) example of the functionality is, at least to me, the first step needed in discussing whether the functionality is acceptable to implement in LTO, perhaps debate whether there are other approaches that could provide the same results, etc.

Does that happen with lld only, or with ld.gold too?

Both, and I haven't found any either linker or clang flags to alleviate that.

If your build system supports some way to provide custom flags to certain compilations, you might be able to opt-out of lto for certain files. Though I don't know how pervasive the use of Qt is & thus how maintainable it would be to maintain those opt-out flags. I guess if Qt use is restricted to a certain wrapper/UI library, maybe just don't compile that library with LTO - but you can compile the business logic/implementation of the programs of interest with LTO.

I'm using SUSE's Open Build Service which uses gcc with LTO by default now. Some packages can opt-in for clang and flags can be changed. But the fact is that clang cannot be drop-in simple replacement for gcc and it cannot be distro-wide default compiler due to this. The main reason to selectively use clang right now is thinLTO option for big application that take forever to compile otherwise. But with prevalence if LLVM-specific use-cases (such as OpenCL and Vulkan implementation, JIT-compilation stuff such as what Mesa and RPCS3 project do) it would be logical to shift to it. But no distro would allow KDE, its primary desktop environment, and all Qt GUI apps (which is like half or more of all F/OSS GUI apps and pretty much all cross-platform GUI apps) to just break.

I'm not sure that you quite realize the scale of the problem: Qt is the main toolkit of all cross-platform Linux & Windows & Mac GUI apps. The reason why it's not yet a massive OS-breaking disaster is that clang was not yet adopted as primary compiler and many automated build-systems skim on hardware resources for globally enabling LTO.

I think maybe the piece missing here about prioritization is that most of LLVM's contributors (including myself) are paid by various companies to implement/support functionality for their needs. We do our best to make the software generally usable and having multiple invested parties helps keep the overall thing fairly general purpose - but not many of us (not me) have "make this the default compiler on a linux distro" as a priority or goal.

So while I'm sure it's a major issue for that sort of adoption, as you say, that doesn't make it a priority for me (& many other LLVM developers) compared to other uses/features of the project.

PS: this Bugzilla broke with database errors when I wanted to post this originally.

Bother :/ there is some discussion of moving to GitHub issues so we aren't trying to host our own solution here. Hopefully in the not too distant future.

v-fox commented 4 years ago

Does that happen with lld only, or with ld.gold too?

Seems to be both.

… So, someone with a vested interest in both LTO and Qt might need to do a bit more legwork to connect the dots between these two.

(a reduced test case would be great - maybe if you ask the Qt folks what specific features they're depending on they could whip up a small/reduced example of the functionality they want/need, for instance, etc)

Indeed, but when such strong opinions about "correctness" of the "feature" are involved, it's better to be done by someone who knows what they are talking about. My expertise ends on "big thing comes out no good, buttons no go !".

Are you familiar with/able to write C++ code? If so, you might be able to reduce the test case down, even if you're not especially familiar with what's correct or not.

Otherwise maybe you can find some folks you support/who depend on your work who might have more familiarity with this sort of thing to help out.

Nope, sorry. And if C/Python/Lua code is mostly self-explanatory, the more I see object-oriented stuff, the more I hate it.

But I'm not sure why you need my examples for it. I've linked several test applications and several real applications that 100% reproduce it on launch. And you can take any Qt app and break it by compiling with LTO.

I previously mistook Jami/Ring with https://gitlab.linphone.org/BC/public/linphone-desktop I had difficulties with compiling both recently but the latter one has default Qt GUI.

Sorry, didn't quite follow this ^ bit.

I've linked Jami/Ring previously but I meant to link Linphone. Ring had KDE/Qt UI previously but it doesn't build after redesign. Linphone always had Qt as only interface. I've had prolonged troubles compiling and launching both but only Linphone is affected by the LTO problem now.

Does that happen with lld only, or with ld.gold too?

Both, and I haven't found any either linker or clang flags to alleviate that.

If your build system supports some way to provide custom flags to certain compilations, you might be able to opt-out of lto for certain files. Though I don't know how pervasive the use of Qt is & thus how maintainable it would be to maintain those opt-out flags. I guess if Qt use is restricted to a certain wrapper/UI library, maybe just don't compile that library with LTO - but you can compile the business logic/implementation of the programs of interest with LTO.

I'm using SUSE's Open Build Service which uses gcc with LTO by default now. Some packages can opt-in for clang and flags can be changed. But the fact is that clang cannot be drop-in simple replacement for gcc and it cannot be distro-wide default compiler due to this. The main reason to selectively use clang right now is thinLTO option for big application that take forever to compile otherwise. But with prevalence if LLVM-specific use-cases (such as OpenCL and Vulkan implementation, JIT-compilation stuff such as what Mesa and RPCS3 project do) it would be logical to shift to it. But no distro would allow KDE, its primary desktop environment, and all Qt GUI apps (which is like half or more of all F/OSS GUI apps and pretty much all cross-platform GUI apps) to just break.

I'm not sure that you quite realize the scale of the problem: Qt is the main toolkit of all cross-platform Linux & Windows & Mac GUI apps. The reason why it's not yet a massive OS-breaking disaster is that clang was not yet adopted as primary compiler and many automated build-systems skim on hardware resources for globally enabling LTO.

PS: this Bugzilla broke with database errors when I wanted to post this originally.

dwblaikie commented 4 years ago

… So, someone with a vested interest in both LTO and Qt might need to do a bit more legwork to connect the dots between these two.

(a reduced test case would be great - maybe if you ask the Qt folks what specific features they're depending on they could whip up a small/reduced example of the functionality they want/need, for instance, etc)

Indeed, but when such strong opinions about "correctness" of the "feature" are involved, it's better to be done by someone who knows what they are talking about. My expertise ends on "big thing comes out no good, buttons no go !".

Are you familiar with/able to write C++ code? If so, you might be able to reduce the test case down, even if you're not especially familiar with what's correct or not.

Otherwise maybe you can find some folks you support/who depend on your work who might have more familiarity with this sort of thing to help out.

I previously mistook Jami/Ring with https://gitlab.linphone.org/BC/public/linphone-desktop I had difficulties with compiling both recently but the latter one has default Qt GUI.

Sorry, didn't quite follow this ^ bit.

Does that happen with lld only, or with ld.gold too?

Both, and I haven't found any either linker or clang flags to alleviate that.

If your build system supports some way to provide custom flags to certain compilations, you might be able to opt-out of lto for certain files. Though I don't know how pervasive the use of Qt is & thus how maintainable it would be to maintain those opt-out flags. I guess if Qt use is restricted to a certain wrapper/UI library, maybe just don't compile that library with LTO - but you can compile the business logic/implementation of the programs of interest with LTO.

v-fox commented 4 years ago

… So, someone with a vested interest in both LTO and Qt might need to do a bit more legwork to connect the dots between these two.

(a reduced test case would be great - maybe if you ask the Qt folks what specific features they're depending on they could whip up a small/reduced example of the functionality they want/need, for instance, etc)

Indeed, but when such strong opinions about "correctness" of the "feature" are involved, it's better to be done by someone who knows what they are talking about. My expertise ends on "big thing comes out no good, buttons no go !".

I previously mistook Jami/Ring with https://gitlab.linphone.org/BC/public/linphone-desktop I had difficulties with compiling both recently but the latter one has default Qt GUI.

Does that happen with lld only, or with ld.gold too?

Both, and I haven't found any either linker or clang flags to alleviate that.

LebedevRI commented 4 years ago

Does that happen with lld only, or with ld.gold too?

dwblaikie commented 4 years ago

Is there a standalone reproducer that isn't "LTO-link to QT"?

How would I know ? I only deal with broken packages. There are short test apps in the links above but they link to Qt. Those Qt devs seem to know exactly what's causing it and how to reproduce it because they claim that it's a "feature" that is "better than the old way", maybe there are even mailing talks between clang & Qt about it somewhere.

I'm sure you can easily reproduce a broken build with your understanding. For example, https://github.com/dolphin-emu/dolphin https://github.com/RPCS3/rpcs3 https://github.com/yuzu-emu/yuzu-mainline https://git.jami.net/savoirfairelinux/ring-project are definitively affected. And they are big potential beneficiaries of LTO too, pain to compile without clang's thinLTO.

Essentially, it looks like the issue is stuck in the middle - people on the LLVM side (so far/from what I can see - but maybe other folks'll chime in on this bug or elsewhere) aren't sufficintly interested in Qt+LTO to do a bunch of work to investigate the failure and fix it (most of us here are paid to work on this for various specific use cases, rather than hobbyists who might be interested in doing some more legwork to track down bugs in any place), knowing it might be an issue on the Qt side (& almost/sort of sounds like they admit that they are relying on something that isn't guaranteed?).

And the Qt folks aren't sufficiently interested in this particular combination of LTO+Clang when there are other compilers/build modes they're happy enough to have supported without this extra one.

So, someone with a vested interest in both LTO and Qt might need to do a bit more legwork to connect the dots between these two.

(a reduced test case would be great - maybe if you ask the Qt folks what specific features they're depending on they could whip up a small/reduced example of the functionality they want/need, for instance, etc)

v-fox commented 4 years ago

Is there a standalone reproducer that isn't "LTO-link to QT"?

How would I know ? I only deal with broken packages. There are short test apps in the links above but they link to Qt. Those Qt devs seem to know exactly what's causing it and how to reproduce it because they claim that it's a "feature" that is "better than the old way", maybe there are even mailing talks between clang & Qt about it somewhere.

I'm sure you can easily reproduce a broken build with your understanding. For example, https://github.com/dolphin-emu/dolphin https://github.com/RPCS3/rpcs3 https://github.com/yuzu-emu/yuzu-mainline https://git.jami.net/savoirfairelinux/ring-project are definitively affected. And they are big potential beneficiaries of LTO too, pain to compile without clang's thinLTO.

dwblaikie commented 4 years ago

Is there a standalone reproducer that isn't "LTO-link to QT"?

+1

An exuplanation of the specific behavior that Qt is relying on that LLVM LTO is not providing would be a good start - those threads seem to assume a lot of knowledge about Qt's implementation that I (& probably others on the LLVM project) do not have, so I can't really say who's to blame, how it should be fixed (if it can be), etc.

LebedevRI commented 4 years ago

Is there a standalone reproducer that isn't "LTO-link to QT"?

inclyc commented 1 year ago

Confirmed that this issue was fixed.

llvmbot commented 1 year ago

@llvm/issue-subscribers-backend-x86

llvm / llvm-project