Closed jjellio closed 2 years ago
@jjellio I think I've seen this before. I believe the issue is that the pointer is constexpr, but the value it's initialized to is a symbol address that's dependent on linking or shared library loading (probably what is meant by 'dynamic initialization' in the err msg). So maybe just replacing constexpr with const would fix it?
No, that won't fix it.
I think, technically, it wants cout to be explicitly instantiated (similar to what you have to do if you want a const std::string
in your class).
I'm not sure what the right pattern is to do this. Belos' solvers want a default so they can optionally use a user provided Ostream from ParameterList. It could work to remove the outputStream_default_
and have the constructor(s) take cout directly. The other way may be to change the explicit instantiation code to add the initialization of outputStream_default_
- I believe this is what the error is after - but it's not clear to me that is legal for GPU code. E.g., a class variable that is from the STL. std:cout
is likely not const either, so this all seems a bit hacky
Oh I get it, so the constexpr was really being used to 'inline' the definition of a static member?
Also, I now agree that it shouldn't need to be const, since it's only used as the default for Teuchos::RCP<std::ostream> outputStream_;
(non-const).
In that case, I do think that replacing every usage of outputStream_default_
with &std::cout
is a reasonable fix. I don't think that would ever change in the future, and even if it needed to it wouldn't be that hard to change the 3 occurences per file. Also lets you remove this hacky looking thing for windows:
#if defined(_WIN32) && defined(__clang__)
static constexpr std::ostream * outputStream_default_ =
__builtin_constant_p(reinterpret_cast<const std::ostream*>(&std::cout));
#else
We've seen the same errors with ROCm 5.0 on Crusher, @lucbv.
I can push a patch that seems to work. I've removed the default, and replaced the construct lines with std::cout
(And changed the Teuchos::rcp
, to a rcpFromRef
).
Did you all work around it? If so, it'd be nice to know if you have a better solution.
We probably just want to temporarily remove the constexpr
while clang/llvm/amd fixes the bug reported here: https://reviews.llvm.org/D119615
My fix is to remove constexpr everywhere but I agree with @jjellio that using const will not work either so I just leave it as static.
The bug reported above makes constexpr
imply __constant__
when compiling with a device-clang compiler which is a problem on its own but probably not sufficient to solve this particular issue.
I tried moving them to const
, but I think the compiler error is that it doesn't want inline dynamic initialization of a class variable. So, even if you dropped to const, you still really need to add a line in the ETI generated code that explicitly sets the variable to be std::cout
(I think). I didn't test that, because hunting down the ETI generator is a pain and it isn't clear that would work if ETI is off.
The patch I've posted does seem to work, but I'll need help in digesting whether the Pull Request testing is working or not (if this is causing actual errors, we have to think about this more).
Bug Report
@trilinos/belos
Description
I've been trying a build of Trilinos that uses Clang 14.x with a gpu target.
This type of error is replicated in many of the solvers. e.g., https://github.com/trilinos/Trilinos/blob/0bd35b1d0298b79ab6e18619e61fa364c84a6267/packages/belos/src/BelosBlockGmresSolMgr.hpp#L340
I'm not sure what the best approach is to mitigate this. @brian-kelley any thoughts on a good replacement?
I don't think Sandia has a testbed with Clang 14.x to reproduce this on.