Closed ikalash closed 11 months ago
@ikalash - the last change to phalanx src code was in July. I would need more information about how the data layouts are being generated in Albany. There was a big set of changes pushed to intrepid2 yesterday. Maybe those have broken the data layout generation in Albany? Just a guess.
This is a new build on Blake. The error was already there on Oct 13. I don't think is related to Intrepid2. Irina, could it be an Albany issue?
@rppawlo I think the issue is that this code
https://github.com/trilinos/Trilinos/blob/develop/packages/phalanx/src/Phalanx_KokkosView_CreateView.hpp#L80-L83
is compiled also when the rank is 1, and that's why the compiler complains that we are accessing an array out of bounds when calling dl.dimension(1)
.
Can you send me instructions to reproduce on blake? Do the phalanx tests build and run?
I think you can use these modules and scripts (Irina, please check): https://github.com/sandialabs/Albany/blob/master/doc/dashboards/blake.sandia.gov/blake_gcc_modules.sh https://github.com/sandialabs/Albany/blob/master/doc/dashboards/blake.sandia.gov/do-cmake-trilinos-gcc-debug
I don't think we build Phalanx tests.
I'm not very familiar with constexpr
, but it might be possible to declare rank()
as a const expression in the MDALayout
:
constexpr PHX::Device::size_type
PHX::MDALayout<Tag0,Tag1,Tag2,Tag3,Tag4,Tag5,Tag6,Tag7>::rank() const
{ return Rank; }
and then use
if constexpr(dl.rank() == 1)
so that only the body of the correct condition gets compiled
@mperego - that was my first thought too. But I just used the albany build and all phalanx and panzer tests/examples build successfully and pass testing. I'd need to see the albany code and probably dump the runtime dag to see what is going on.
Those should be the correct scripts to use. Thanks for looking at it @rppawlo . I don’t know if ohalan tests pass. I don’t think they are enabled in the nightlies.
@rppawlo : any updates on this issue? Unfortunately it is still happening in our nightlies: https://sems-cdash-son.sandia.gov/cdash/viewBuildError.php?buildid=53488 .
@ikalash - I used the albany build scripts and all of the phalanx and panzer tests build and pass on blake. I suspect this is something in Albany or it could be a use case not covered under phalanx testing. I need to build albany to figure this out. I will try to do it next week. Haven't built albany in probably 5 years...
@rppawlo I have a very small reproducer, which shows some hint toward what Mauro suggests (i.e., the compiler can capture some type at compile time rather than deferring until run time). I put all the files on blake, in /home/projects/albany/forRoger
(src and bld folders). After you load all the modules according to the file linked by Mauro, you can run the do-cmake.sh
script in bld, and build.
Now, regarding the source code. Originally, I put all the stuff in the main file, and got the error. What striked me was that the compiler should not know that the layout has 1 dimension, since it gets it from the (runtime) FieldTag obj. But since the layout is created in the same scope where createView
is called, I wondered whether the compiler was smart enough to pass static info (the fact that the layout has rank 1) to the instantiated createView
call, trying to compile away stuff (even though it's odd for a DEBUG build). So I moved the creation of the layout in a second cpp file. In main.cpp
you can see
auto dl = Teuchos::rcp(new PHX::MDALayout<Dummy>(0)); // Will get the error!
//auto dl = get_layout (); // Will work!
PHX::Tag<ScalarT> tag("hello world!", dl);
If you swap the comment between the first two lines, voila', it builds!
So this suggests Mauro was on the right track. I'm not entierly sure why the compiler can use compile-time info from MDALayout when accessing t.dataLayout().dimension(X)
in createView
, but that's what I think it's happening.
The fact that a small reproducer like this can show the behavior tells me this is not a compiler error, but maybe a "compiler feature" (annoying as it may be). If Mauro is right, using if constexpr
when trying to access the layouts dimensions may fix it. I can try to do it myself tomorrow...
Let me know what you think.
Thanks for looking into this, @bartgol ! I have a marginally-related question. I just had a look on blake into why the Trilinos build is being reported as failed on blake with the warning-free configure options, and I think it's because there are warnings encountered in several Trilinos packages (seacas, ml, aztecoo, zoltan). I am attaching the output of the build. Do you agree that this is the problem? CDash is reporting something different but I think it's not right: https://sems-cdash-son.sandia.gov/cdash/viewBuildError.php?buildid=53585 . If so and if Trilinos supports a warning-free build, should we option a separate issue on the warnings?
Thanks @bartgol ! Thats great news! I'm a little worried that the if constexpr will work for all of phalanx. The rank() in the if statement in the view allocator is a runtime parameter in general. if constexpr have to be evaluated at compile time. It might work for the issue above where a compiletime parameter sneaks through, but in general I suspect it will not work, especially for DynRankViews (if I understand your discussion above). I'm close to having the refactor of the MDALayout for variadic templates complete. That could be a path forward as well.
@rppawlo Yes, based on what I observed and assuming we're right about the compiler attempting to compile away code, I think the variadic templates should remove this issue.
Let me know if I can help in any way.
The variadic templates is done. I'm running some tests now - building against panzer with various compilers. I'll put up the PR shortly and test on blake if everything passes.
@rppawlo thanks! I'm happy to test this once the PR is up to verify that Albany builds cleanly with the fixed.
@ikalash - I've got an albany build going on blake now. Will post results soon. I will not merge this right away - also want to test against drekar, charon and empire. It's a big change to a fundmental class, want to be very careful with it.
@rppawlo : ok sounds good. Thanks again!
Well, that did not fix the issue on blake. I will have to rewrite the viewCreate routine to split into separate functions based on rank. Will try to get to this next week. I will still push the layout changes after further testing.
Well, this is funny. The refactor meant to fix the issue on blake is now causing the gnu PR build to fail with the same error in phalanx tests. At least this will be faster to fix now that we don't have to deal with horrific filesystem on blake or pull Albany into the build.
Has this been resolved? It looks like there is a different array-bounds warning-as-error showing up on our blake build now so I figured I'd double check if it is related to this issue before I dig into it.
Thanks for checking @mcarlson801 . There are still phalanx-related errors in the blake warning free build unfortunately. It's possible they are different than before. Thanks for looking into it!
The phalanx issue was fixed on blake. There were other albany failures not associated with phalanx that still need to be addressed on that platform.
@ikalash - I did not see any more from phalanx when fixing the first issue. Can you post the new failure?
Here is the error now: https://sems-cdash-son.sandia.gov/cdash/viewBuildError.php?buildid=54316 . It's actually now about Sacado.
Should I close this issue and open a new one?
@ikalash I think the current error is actually a bug related to an Albany enthalpy evaluator. Just wanted to double check it wasn't related to this.
Yes, the problem seems to be coming from the enthalpy evaluator. I agree that it seems unrelated to this issue. Maybe you could open a new issue if you are looking at this, and we can close this one?
Our warning-free nightly build on Blake for Albany has the following error:
https://sems-cdash-son.sandia.gov/cdash/viewBuildError.php?buildid=53327
Could someone please have a look?
@trilinos/phalanx