The hardware I'm running on is:
Vendor: AMD (0x1002)
Device: AMD Radeon RX 590 Series (polaris10, LLVM 15.0.7, DRM 3.49, 6.1.7-200.fc37.x86_64) (0x67df)
The issue itself is a program supplied by presumably Valve called gldriverquery crashes on exit. In addition to this, the symptom of corrupted rendering was seen, which has been debugged here: https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/issues/1542
From what I can tell it seems to be believed that the graphics corruption is not related, the reports state that Mesa was bisected and resolved on that side.
The SHA1 checksum of the file is: d31a82db7ede30e1e32849b614aaf04d263dc642.var/app/com.valvesoftware.Steam/.local/share/Steam/ubuntu12_64/gldriverquery
It's part of the Flatpak Steam release version 1.0.0.75
The problem was discovered when running the Flatpak Steam version (and other programs) using Mesa within Flatpak org.freedesktop.Platform.GL.default version 22.3.3. This version was updated to use LLVM 15, the previous platform version 22.3.2 was built using LLVM 14.
I assumed in my bisection that the problem I was looking for was on the LLVM side, which I see is also mentioned in the Valve ticket mentioned above.
I bisected from llvmorg-14.0.6 with mesa-22.3.3, rebuilding and cleaning after each step, executing the gldriverquery program to know when the state is good or not.
LLVM was built with the following commands:
The offending commit e6f1f062457c928c18a88c612f39d9e168f65a85 was the first bad I found. I bisected this further to find the exact changes which seem to need to be reverted to avoid abort/segfault, the following are the reversions I made:
From 092b255d296df4784890f9e66ddf0bb56b452e9b Mon Sep 17 00:00:00 2001
From: Jon Emil Jahren <jonemilj@gmail.com>
Date: Sun, 29 Jan 2023 02:42:48 +0100
Subject: [PATCH] Revert changes causing ctor/dtor issues
Partially revert e6f1f062457c928c18a88c612f39d9e168f65a85
---
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp | 15 ++++++++-------
llvm/lib/IR/PassRegistry.cpp | 10 ++++++++--
2 files changed, 16 insertions(+), 9 deletions(-)
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 195c0e6a836f..2bbd2bf762e0 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -61,6 +61,7 @@
#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/KnownBits.h"
#include "llvm/Support/MachineValueType.h"
+#include "llvm/Support/ManagedStatic.h"
#include "llvm/Support/MathExtras.h"
#include "llvm/Support/Mutex.h"
#include "llvm/Support/raw_ostream.h"
@@ -10832,19 +10833,19 @@ namespace {
} // end anonymous namespace
+static ManagedStatic<std::set<EVT, EVT::compareRawBits>> EVTs;
+static ManagedStatic<EVTArray> SimpleVTArray;
+static ManagedStatic<sys::SmartMutex<true>> VTMutex;
+
/// getValueTypeList - Return a pointer to the specified value type.
///
const EVT *SDNode::getValueTypeList(EVT VT) {
- static std::set<EVT, EVT::compareRawBits> EVTs;
- static EVTArray SimpleVTArray;
- static sys::SmartMutex<true> VTMutex;
-
if (VT.isExtended()) {
- sys::SmartScopedLock<true> Lock(VTMutex);
- return &(*EVTs.insert(VT).first);
+ sys::SmartScopedLock<true> Lock(*VTMutex);
+ return &(*EVTs->insert(VT).first);
}
assert(VT.getSimpleVT() < MVT::VALUETYPE_SIZE && "Value type out of range!");
- return &SimpleVTArray.VTs[VT.getSimpleVT().SimpleTy];
+ return &SimpleVTArray->VTs[VT.getSimpleVT().SimpleTy];
}
/// hasNUsesOfValue - Return true if there are exactly NUSES uses of the
diff --git a/llvm/lib/IR/PassRegistry.cpp b/llvm/lib/IR/PassRegistry.cpp
index 6c22fcd34769..94f607afec47 100644
--- a/llvm/lib/IR/PassRegistry.cpp
+++ b/llvm/lib/IR/PassRegistry.cpp
@@ -15,15 +15,21 @@
#include "llvm/ADT/STLExtras.h"
#include "llvm/Pass.h"
#include "llvm/PassInfo.h"
+#include "llvm/Support/ManagedStatic.h"
#include <cassert>
#include <memory>
#include <utility>
using namespace llvm;
+// FIXME: We use ManagedStatic to erase the pass registrar on shutdown.
+// Unfortunately, passes are registered with static ctors, and having
+// llvm_shutdown clear this map prevents successful resurrection after
+// llvm_shutdown is run. Ideally we should find a solution so that we don't
+// leak the map, AND can still resurrect after shutdown.
+static ManagedStatic<PassRegistry> PassRegistryObj;
PassRegistry *PassRegistry::getPassRegistry() {
- static PassRegistry PassRegistryObj;
- return &PassRegistryObj;
+ return &*PassRegistryObj;
}
//===----------------------------------------------------------------------===//
--
2.39.1
When running the crashing program over and over, there seem to be some kind of undefined behaviour, so the crash can either be an abort
gldriverquery: /home/jon/projects/llvm-project/llvm/include/llvm/PassInfo.h:99: llvm::Pass* llvm::PassInfo::createPass() const: Assertion `NormalCtor && "Cannot call createPass on PassInfo without default ctor!"' failed.
I tested the partial reversions on top of llvmorg-15.0.7, and at least with respect to the gldriverquery program it works. I also tested the reversions I was left with individually, with only the PassRegistry revert in place, it just failed a bit later with the following abort:
Unfortunately I don't even know where to begin on how to create a reproducible version of this, and it may be for all I know that Mesa is partially responsible, however I believe there is something weird going on as can be seen by the FIXME comment made which was removed by the bad commit.
So my hypothesis is that Mesa is not doing something wrong and it's the ctor/dtor issue referred to by the FIXME which hasn't been addressed properly. But perhaps someone more familiar with it will be able to look at the highlighted code and can get some value from the report regardless of a missing simpler way of reproducing it. For what it's worth the workaround removed seem to go far back from ee3570f0ff2e6fc47eae9c417503709d9031a722
I debugged a crash I experienced, but it is also reported in multiple places including here https://github.com/ValveSoftware/steam-for-linux/issues/8853
The hardware I'm running on is: Vendor: AMD (0x1002) Device: AMD Radeon RX 590 Series (polaris10, LLVM 15.0.7, DRM 3.49, 6.1.7-200.fc37.x86_64) (0x67df)
The issue itself is a program supplied by presumably Valve called
gldriverquery
crashes on exit. In addition to this, the symptom of corrupted rendering was seen, which has been debugged here: https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/issues/1542 From what I can tell it seems to be believed that the graphics corruption is not related, the reports state that Mesa was bisected and resolved on that side. The SHA1 checksum of the file is:d31a82db7ede30e1e32849b614aaf04d263dc642
.var/app/com.valvesoftware.Steam/.local/share/Steam/ubuntu12_64/gldriverquery
It's part of the Flatpak Steam release version 1.0.0.75 The problem was discovered when running the Flatpak Steam version (and other programs) using Mesa within Flatpakorg.freedesktop.Platform.GL.default
version 22.3.3. This version was updated to use LLVM 15, the previous platform version 22.3.2 was built using LLVM 14.I assumed in my bisection that the problem I was looking for was on the LLVM side, which I see is also mentioned in the Valve ticket mentioned above.
I bisected from llvmorg-14.0.6 with mesa-22.3.3, rebuilding and cleaning after each step, executing the
gldriverquery
program to know when the state is good or not. LLVM was built with the following commands:Mesa with the following:
my-llvm-x64:
The offending commit e6f1f062457c928c18a88c612f39d9e168f65a85 was the first bad I found. I bisected this further to find the exact changes which seem to need to be reverted to avoid abort/segfault, the following are the reversions I made:
When running the crashing program over and over, there seem to be some kind of undefined behaviour, so the crash can either be an abort
gldriverquery_abort.txt or segfault gldriverquery_segfault.txt
I tested the partial reversions on top of llvmorg-15.0.7, and at least with respect to the
gldriverquery
program it works. I also tested the reversions I was left with individually, with only the PassRegistry revert in place, it just failed a bit later with the following abort:Unfortunately I don't even know where to begin on how to create a reproducible version of this, and it may be for all I know that Mesa is partially responsible, however I believe there is something weird going on as can be seen by the FIXME comment made which was removed by the bad commit. So my hypothesis is that Mesa is not doing something wrong and it's the ctor/dtor issue referred to by the FIXME which hasn't been addressed properly. But perhaps someone more familiar with it will be able to look at the highlighted code and can get some value from the report regardless of a missing simpler way of reproducing it. For what it's worth the workaround removed seem to go far back from ee3570f0ff2e6fc47eae9c417503709d9031a722