Closed ktf closed 4 months ago
Does anyone have an hint of what might be happening? This is preventing ALICE to switch to C++20.
That's probably a bug in our forward declaration state restore logic. @Axel-Naumann used to be taking care of these issues, generally.
For the record, this is also both on linux and macOS alike.
Notice also that there are multiple classes called "MCLabel" in different namespaces.
If you add the includes for o2::zdc::BCData
and other entities that this macro uses?
Yes, indeed for this particular case it helps. It also helps for some of the actual things which are failings in our builds (but not all of them).
How can we proceed with this?
@ktf ideally we need a standalone reproducer that does not depend on the ALICE environment. Next we should verify if the problem still exists in current master
after the upgrade to LLVM 16. In principle, this could also happen independently, if it is feasible to build an ALICE environment with a version of ROOT master
... The reason for this step is that we know about certain problems in the C++20 support with LLVM 13; if it's fixed in LLVM 16, we can potentially hunt down the fix and backport it - no guarantees though, we also already know about certain changes that cannot be backported due to heavily different code bases.
Is there a way I can easily create a standalone reproducer? E.g. something which dumps the internal state at load and a self contained sourcefile?
IMHO it would be much easier if you could bootstrap the ALICE environment and have a look at the problem directly or, if you are at CERN, we can sit together and single step with a debug build.
I actually see different stacktraces, but what seems to be common is that they all crash in EnterTemplatedContext. Also notice that for some of the cases it does not always crash, possibly indicating some memory corruption.
I do not have any means for now to move to master.
Is there a way I can easily create a standalone reproducer? E.g. something which dumps the internal state at load and a self contained sourcefile?
No, this is a manual process.
IMHO it would be much easier if you could bootstrap the ALICE environment and have a look at the problem directly or, if you are at CERN, we can sit together and single step with a debug build.
This is exactly what I do not want to do; debugging LLVM / Clang / Cling internals inside of an experiment framework is a really time-consuming exercise and should be the very last resort. Any effort that can get us around that time sink will get a solution faster and in a much straighter line.
I actually see different stacktraces, but what seems to be common is that they all crash in EnterTemplatedContext. Also notice that for some of the cases it does not always crash, possibly indicating some memory corruption.
I do not have any means for now to move to master.
That's not quite ideal, even not for testing purposes? As I said, we know that LLVM 13 has deficiencies related to C++20 and it's not even clear if a fix could be backported...
No, this is a manual process.
This would probably be a very nice GSoC (or similar) project, actually.
This is exactly what I do not want to do; debugging LLVM / Clang / Cling internals inside of an experiment framework is a really time-consuming exercise and should be the very last resort. Any effort that can get us around that time sink will get a solution faster and in a much straighter line.
Depends on the experiment, I guess. ;-) Our stack builds quite nicely (including precompiled binaries for supported architectures).
That's not quite ideal, even not for testing purposes?
Well, let's see... https://github.com/alisw/alidist/pull/5304 is what is needed. Obviously "move production to master" is not an option.
As I said, we know that LLVM 13 has deficiencies related to C++20 and it's not even clear if a fix could be backported.
To be clear, the whole stack has zero C++20 code in it (we simply turn on -std=c++20 everywhere). I think this is a genuine memory corruption when dealing with forward declarations (when C++20 support is enabled).
No, this is a manual process.
This would probably be a very nice GSoC (or similar) project, actually.
Hm maybe, we would need to trace all pcms + headers loaded into Cling and bundle them up when dumping...
This is exactly what I do not want to do; debugging LLVM / Clang / Cling internals inside of an experiment framework is a really time-consuming exercise and should be the very last resort. Any effort that can get us around that time sink will get a solution faster and in a much straighter line.
Depends on the experiment, I guess. ;-) Our stack builds quite nicely (including precompiled binaries for supported architectures).
Yes, fair point. Though it's not only about building, but debugging any complex software product can become quite hairy (and debugging LLVM and Clang is already complicated on its own)
That's not quite ideal, even not for testing purposes?
Well, let's see... alisw/alidist#5304 is what is needed. Obviously "move production to master" is not an option.
Sure, that's clear. So far I've mostly worked with CMS and they have special builds that can produce, my understanding is that ATLAS something similar, so working out the option might be very useful for all sorts of investigations!
As I said, we know that LLVM 13 has deficiencies related to C++20 and it's not even clear if a fix could be backported.
To be clear, the whole stack has zero C++20 code in it (we simply turn on -std=c++20 everywhere). I think this is a genuine memory corruption when dealing with forward declarations (when C++20 support is enabled).
I agree. The question is who is causing that, ie is ROOT producing legitimate code that Clang crashes on?
@hahnjo master dies with:
CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
XROOTD_UTILS_LIBRARIES
linked by target "NetxNG" in directory /sw/SOURCES/ROOT/master/a1f54ed694/net/netxng
any idea of what is missing? What is XROOTD_UTILS_LIBRARIES?
@ktf this may be https://github.com/root-project/root/pull/14180 - which version of XRootD do you have?
v5.6.0 ... let me retry with v5.6.4
@ktf any updates on this, did it work? Alternatively is there a debug version installed that I can run on lxplus
or such? While I said that I would like to avoid debugging LLVM / Clang / Cling from within the experiment framework, maybe I can get enough information to "guess" a standalone reproducer. Particularly interesting would be what code is passed to cling::IncrementalParser::Compile
(assuming it's indeed a crash inside Clang)...
Find here the dump by instrumenting cling::IncrementalParser::Compile
.
For the record, it seems to still fail, even with the master of ROOT:
The relevant stacktraces seem to be:
#6 0x00007f63165d6483 in clang::Sema::getTemplateDepth(clang::Scope*) const () from /sw/slc7_x86-64/ROOT/master-local3/lib/libCling.so
#7 0x00007f6316011655 in clang::Sema::EnterTemplatedContext(clang::Scope*, clang::DeclContext*) () from /sw/slc7_x86-64/ROOT/master-local3/lib/libCling.so
#8 0x00007f63160e0101 in clang::Sema::ActOnReenterTemplateScope(clang::Decl*, llvm::function_ref<clang::Scope* ()>) () from /sw/slc7_x86-64/ROOT/master-local3/lib/libCling.so
#9 0x00007f6315ce22c9 in clang::Parser::ReenterTemplateScopes(clang::Parser::MultiParseScope&, clang::Decl*) () from /sw/slc7_x86-64/ROOT/master-local3/lib/libCling.so
#10 0x00007f6315d0772c in clang::Parser::ParseLexedMethodDef(clang::Parser::LexedMethod&) () from /sw/slc7_x86-64/ROOT/master-local3/lib/libCling.so
#11 0x00007f6315d075dd in clang::Parser::ParseLexedMethodDefs(clang::Parser::ParsingClass&) () from /sw/slc7_x86-64/ROOT/master-local3/lib/libCling.so
#12 0x00007f6315c4c8cb in clang::Parser::ParseCXXMemberSpecification(clang::SourceLocation, clang::SourceLocation, clang::ParsedAttributes&, unsigned int, clang::Decl*) () from /sw/slc7_x86-64/ROOT/master-local3/lib/libCling.so
and
#6 0x00007f042c7fab68 in clang::Sema::hasAcceptableDefinition(clang::NamedDecl*, clang::NamedDecl**, clang::Sema::AcceptableKind, bool) () from /sw/slc7_x86-64/ROOT/master-local3/lib/libCling.so
#7 0x00007f042c7fbf91 in clang::Sema::RequireCompleteTypeImpl(clang::SourceLocation, clang::QualType, clang::Sema::CompleteTypeKind, clang::Sema::TypeDiagnoser*) () from /sw/slc7_x86-64/ROOT/master-local3/lib/libCling.so
#8 0x00007f042c57721b in IsUserDefinedConversion(clang::Sema&, clang::Expr*, clang::QualType, clang::UserDefinedConversionSequence&, clang::OverloadCandidateSet&, clang::Sema::AllowedExplicit, bool) [clone .constprop.0] () from /sw/slc7_x86-64/ROOT/master-local3/lib/libCling.so
#9 0x00007f042c577dc0 in TryUserDefinedConversion(clang::Sema&, clang::Expr*, clang::QualType, bool, clang::Sema::AllowedExplicit, bool, bool, bool, bool) [clone .constprop.0] () from /sw/slc7_x86-64/ROOT/master-local3/lib/libCling.so
#10 0x00007f042c578644 in TryImplicitConversion(clang::Sema&, clang::Expr*, clang::QualType, bool, clang::Sema::AllowedExplicit, bool, bool, bool, bool) () from /sw/slc7_x86-64/ROOT/master-local3/lib/libCling.so
#11 0x00007f042c578938 in TryReferenceInit(clang::Sema&, clang::Expr*, clang::QualType, clang::SourceLocation, bool, bool) () from /sw/slc7_x86-64/ROOT/master-local3/lib/libCling.so
#12 0x00007f042c57a3c6 in TryCopyInitialization(clang::Sema&, clang::Expr*, clang::QualType, bool, bool, bool, bool) () from /sw/slc7_x86-64/ROOT/master-local3/lib/libCling.so
#13 0x00007f042c5741a0 in clang::Sema::AddOverloadCandidate(clang::FunctionDecl*, clang::DeclAccessPair, llvm::ArrayRef<clang::Expr*>, clang::OverloadCandidateSet&, bool, bool, bool, bool, clang::CallExpr::ADLCallKind, llvm::MutableArrayRef<clang::ImplicitConversionSequence>, clang::OverloadCandidateParamOrder) () from /sw/slc7_x86-64/ROOT/master-local3/lib/libCling.so
#14 0x00007f042c57599e in clang::Sema::AddNonMemberOperatorCandidates(clang::UnresolvedSetImpl const&, llvm::ArrayRef<clang::Expr*>, clang::OverloadCandidateSet&, clang::TemplateArgumentListInfo*) () from /sw/slc7_x86-64/ROOT/master-local3/lib/libCling.so
#15 0x00007f042c584a40 in clang::Sema::LookupOverloadedBinOp(clang::OverloadCandidateSet&, clang::OverloadedOperatorKind, clang::UnresolvedSetImpl const&, llvm::ArrayRef<clang::Expr*>, bool) () from /sw/slc7_x86-64/ROOT/master-local3/lib/libCling.so
#16 0x00007f042c590463 in clang::Sema::CreateOverloadedBinOp(clang::SourceLocation, clang::BinaryOperatorKind, clang::UnresolvedSetImpl const&, clang::Expr*, clang::Expr*, bool, bool, clang::FunctionDecl*) () from /sw/slc7_x86-64/ROOT/master-local3/lib/libCling.so
Find here the dump by instrumenting
cling::IncrementalParser::Compile
.
Thanks for that; I had a quick look already yesterday afternoon, but couldn't spot anything obviously wrong. From the ParseCXXMemberSpecification
debug printouts, it appears the class it's tripping on is Rotation2D
, but there I cannot spot much wrong either...
For the record, it seems to still fail, even with the master of ROOT:
Okay, too bad (I cannot access the log because of permission problems, but probably no new relevant information there).
The relevant stacktraces seem to be:
#6 0x00007f63165d6483 in clang::Sema::getTemplateDepth(clang::Scope*) const () from /sw/slc7_x86-64/ROOT/master-local3/lib/libCling.so #7 0x00007f6316011655 in clang::Sema::EnterTemplatedContext(clang::Scope*, clang::DeclContext*) () from /sw/slc7_x86-64/ROOT/master-local3/lib/libCling.so #8 0x00007f63160e0101 in clang::Sema::ActOnReenterTemplateScope(clang::Decl*, llvm::function_ref<clang::Scope* ()>) () from /sw/slc7_x86-64/ROOT/master-local3/lib/libCling.so #9 0x00007f6315ce22c9 in clang::Parser::ReenterTemplateScopes(clang::Parser::MultiParseScope&, clang::Decl*) () from /sw/slc7_x86-64/ROOT/master-local3/lib/libCling.so #10 0x00007f6315d0772c in clang::Parser::ParseLexedMethodDef(clang::Parser::LexedMethod&) () from /sw/slc7_x86-64/ROOT/master-local3/lib/libCling.so #11 0x00007f6315d075dd in clang::Parser::ParseLexedMethodDefs(clang::Parser::ParsingClass&) () from /sw/slc7_x86-64/ROOT/master-local3/lib/libCling.so #12 0x00007f6315c4c8cb in clang::Parser::ParseCXXMemberSpecification(clang::SourceLocation, clang::SourceLocation, clang::ParsedAttributes&, unsigned int, clang::Decl*) () from /sw/slc7_x86-64/ROOT/master-local3/lib/libCling.so
and
#6 0x00007f042c7fab68 in clang::Sema::hasAcceptableDefinition(clang::NamedDecl*, clang::NamedDecl**, clang::Sema::AcceptableKind, bool) () from /sw/slc7_x86-64/ROOT/master-local3/lib/libCling.so #7 0x00007f042c7fbf91 in clang::Sema::RequireCompleteTypeImpl(clang::SourceLocation, clang::QualType, clang::Sema::CompleteTypeKind, clang::Sema::TypeDiagnoser*) () from /sw/slc7_x86-64/ROOT/master-local3/lib/libCling.so #8 0x00007f042c57721b in IsUserDefinedConversion(clang::Sema&, clang::Expr*, clang::QualType, clang::UserDefinedConversionSequence&, clang::OverloadCandidateSet&, clang::Sema::AllowedExplicit, bool) [clone .constprop.0] () from /sw/slc7_x86-64/ROOT/master-local3/lib/libCling.so #9 0x00007f042c577dc0 in TryUserDefinedConversion(clang::Sema&, clang::Expr*, clang::QualType, bool, clang::Sema::AllowedExplicit, bool, bool, bool, bool) [clone .constprop.0] () from /sw/slc7_x86-64/ROOT/master-local3/lib/libCling.so #10 0x00007f042c578644 in TryImplicitConversion(clang::Sema&, clang::Expr*, clang::QualType, bool, clang::Sema::AllowedExplicit, bool, bool, bool, bool) () from /sw/slc7_x86-64/ROOT/master-local3/lib/libCling.so #11 0x00007f042c578938 in TryReferenceInit(clang::Sema&, clang::Expr*, clang::QualType, clang::SourceLocation, bool, bool) () from /sw/slc7_x86-64/ROOT/master-local3/lib/libCling.so #12 0x00007f042c57a3c6 in TryCopyInitialization(clang::Sema&, clang::Expr*, clang::QualType, bool, bool, bool, bool) () from /sw/slc7_x86-64/ROOT/master-local3/lib/libCling.so #13 0x00007f042c5741a0 in clang::Sema::AddOverloadCandidate(clang::FunctionDecl*, clang::DeclAccessPair, llvm::ArrayRef<clang::Expr*>, clang::OverloadCandidateSet&, bool, bool, bool, bool, clang::CallExpr::ADLCallKind, llvm::MutableArrayRef<clang::ImplicitConversionSequence>, clang::OverloadCandidateParamOrder) () from /sw/slc7_x86-64/ROOT/master-local3/lib/libCling.so #14 0x00007f042c57599e in clang::Sema::AddNonMemberOperatorCandidates(clang::UnresolvedSetImpl const&, llvm::ArrayRef<clang::Expr*>, clang::OverloadCandidateSet&, clang::TemplateArgumentListInfo*) () from /sw/slc7_x86-64/ROOT/master-local3/lib/libCling.so #15 0x00007f042c584a40 in clang::Sema::LookupOverloadedBinOp(clang::OverloadCandidateSet&, clang::OverloadedOperatorKind, clang::UnresolvedSetImpl const&, llvm::ArrayRef<clang::Expr*>, bool) () from /sw/slc7_x86-64/ROOT/master-local3/lib/libCling.so #16 0x00007f042c590463 in clang::Sema::CreateOverloadedBinOp(clang::SourceLocation, clang::BinaryOperatorKind, clang::UnresolvedSetImpl const&, clang::Expr*, clang::Expr*, bool, bool, clang::FunctionDecl*) () from /sw/slc7_x86-64/ROOT/master-local3/lib/libCling.so
Right; I guess this means we do have to get some more information from a debugger. Is there a debug version that I can load up on lxplus
or would this only be possible if the two of us sit together?
it appears the class it's tripping on is Rotation2D, but there I cannot spot much wrong either
How can you tell? That said I tried adding explicitly #include "MathUtils/Cartesian.h"
without much luck.
Okay, too bad (I cannot access the log because of permission problems, but probably no new relevant information there).
Yes, I imagined that and copied the relevant stacktraces.
Right; I guess this means we do have to get some more information from a debugger. Is there a debug version that I can load up on lxplus or would this only be possible if the two of us sit together?
We can try next week. I will in the meanwhile try to do a debug build on CVMFS.
it appears the class it's tripping on is Rotation2D, but there I cannot spot much wrong either
How can you tell? That said I tried adding explicitly
#include "MathUtils/Cartesian.h"
without much luck.
The last ParseCXXMemberSpecification
mentions MathUtils/Cartesian.h:61
, but maybe I'm jumping the investigations here... Let's see next week maybe.
Right... Anyways, including the header does not help.
@hahnjo, if you look at the diff from our llvm fork you will see a bunch of work done in the area of RAII objects. That's there to be able to store and restore the compiler state so that it can jump to loading a header file on demand and continue parsing. I'd bet my money that something is not stored/restored with the new version of llvm...
@hahnjo, if you look at the diff from our llvm fork you will see a bunch of work done in the area of RAII objects. That's there to be able to store and restore the compiler state so that it can jump to loading a header file on demand and continue parsing. I'd bet my money that something is not stored/restored with the new version of llvm...
Indeed, I found two fields that were not correctly reset: https://github.com/root-project/root/pull/15004 (actually already there with LLVM 13...)
@ktf would be appreciated if you can test this on your side since we don't have a standalone reproducer. The commits should apply cleanly to ROOT 6.30 (only the first one is needed as a "fix")
Nice catch, @hahnjo!
@ktf to be immune to such problems my recommendation has always been to move to C++ modules aware dictionaries on the experiments side. That means running rootcling --cxxmodule
plus some more scaffolding but that's workable. If one starts from the root of the build rules it can be done incrementally, library for library.
@hahnjo thank you! I will try it asap. @vgvassilev yeah, I know, I will try to bump that in priority.
Hi @hahnjo, @dpiparo,
It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.
Sincerely, :robot:
@ktf did you have a chance to test the changes for ALICE? IIRC it's not backported to our v6-30-00-patches
branch, but it should apply cleanly for you to test...
Sorry, I dropped the ball on this. I will try to have a look.
@ktf any updates on this?
Hello, can this issue perhaps be closed, @ktf?
I can confirm that the reproducer is now working correctly (while it still breaks with the old ROOT). Thank you for your support (and patience).
Hi @ktf, @dpiparo,
It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.
Sincerely, :robot:
Hi @ktf, @dpiparo,
It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.
Sincerely, :robot:
Check duplicate issues.
Description
I am not sure what is actually happening, however the following macro:
crashes ROOT when compiling with C++20 enabled and loading the macro with:
The same macro, when using C++17 works. The produced stacktrace is:
Notice that every bit of that script seems to be necessary.
Reproducer
ROOT version
Both 6.28.04 and 6.30.01 have the same issue, with or without ALICE specific patches. The issue seems to be C++20 related.
Installation method
aliBuild
Operating system
macOS, Linux
Additional context
No response