trilinos / Trilinos

Primary repository for the Trilinos Project
https://trilinos.org/
Other
1.21k stars 565 forks source link

MueLu broke NERSC and waterman Albany nightlies #5044

Closed ikalash closed 4 years ago

ikalash commented 5 years ago

Albany failed to build last night in our NERSC and waterman nightlies. Trilinos compiled just fine, but Albany returned the following error when built on top of Trilinos:

/.../build/TrilinosInstall/lib/libmuelu-interface.a(MueLu_ParameterListInterpreter.cpp.o): In function `virtual thunk to Xpetra::TpetraVector >::~TpetraVector()':

Please see:

https://my.cdash.org/viewBuildError.php?buildid=1643124

and

http://cdash.sandia.gov/CDash-2-3-0/viewBuildError.php?buildid=84229

for more details. Both builds are using the develop branch of Trilinos.

@trilinos/muelu

csiefer2 commented 5 years ago

I might have fixed that last night with #5050 that went in at around 11pm MT.

Are you OK with sitting another day to see if it worked?

ikalash commented 5 years ago

@csiefer2 : great. That wouldn't have made it into last night's nightlies. Lets see what happens tonight - I'm ok with waiting a day, yes.

ikalash commented 5 years ago

FYI the waterman error changed last night. Here is the new one and it happens when trying to compile Trilinos:


/.../repos/Trilinos/packages/xpetra/src/Vector/Xpetra_TpetraVector_def.hpp(267): error: no default constructor exists for class "Xpetra::TpetraMultiVector"

http://cdash.sandia.gov/CDash-2-3-0/viewBuildError.php?buildid=84267

I'm guessing this is the same underlying problem.

csiefer2 commented 5 years ago

Thanks for the heads up.

csiefer2 commented 5 years ago

You got your build scripts for Trilinos on waterman somewhere?

ikalash commented 5 years ago

We created them (actually @jwatkins did). If you'd like to have a look, it is here:

https://github.com/SNLComputation/Albany/blob/master/doc/dashboards/waterman.sandia.gov/ctest_nightly.cmake.frag

Module file is here:

https://github.com/SNLComputation/Albany/blob/master/doc/dashboards/waterman.sandia.gov/waterman_modules_cuda.sh

csiefer2 commented 5 years ago

Looks like some poor copy pasta on my part. #5066

csiefer2 commented 5 years ago

FYI, autotester is having issues so PRs are blocked.

ikalash commented 5 years ago

Unfortunately we have a lot of failed Trilinos builds now in our dashboard due to an issue in Xpetra, e.g.

http://cdash.sandia.gov/CDash-2-3-0/viewBuildError.php?buildid=84295

ikalash commented 5 years ago

I am one step behind you @csiefer2 I think - just saw your posting above. Guess the PR with the fix didn't go in yesterday?

ikalash commented 5 years ago

Looks like the PR went in but we're still seeing build failures for Albany: http://cdash.sandia.gov/CDash-2-3-0/index.php?project=Albany . You can click on any of the Trilinos builds that failed to build to see details.

csiefer2 commented 5 years ago

Fix went in yesterday afternoon. This might be something related but not the same.

Working on fixing #5061 first, then I will see if I can reproduce your issue.

ikalash commented 5 years ago

Should I open a new issue? The problem is still with Xpetra.

csiefer2 commented 5 years ago

Nah. This works fine for me.

csiefer2 commented 5 years ago

5088 fixes all of the "forbids nested type" compile errors.

Your link errors are a different monster.

They appear to be a GO type mismatch.

csiefer2 commented 5 years ago

@jjellio fixed some other stuff #5096, which should hopefully fix the build issues.

There's still potential link errors as well.

ikalash commented 5 years ago

It looks like that PR went in already, unless I'm missing something. Our dashboard is still all red unfortunately: http://cdash.sandia.gov/CDash-2-3-0/index.php?project=Albany&date= . We really need this fixed ASAP. Would it help for me to provide Trilinos/Albany configure scripts for the MueLu team to be able to reproduce the problem?

jjellio commented 5 years ago

@ikalash my PR went through about 5 minutes ago. Just looking through the CDash errors for Trilinos, it looks like the same error in #5096 - which should get corrected with my PR.

Will you have another build that goes sometime soon (today?) If not, we may end up waiting till tomorrow to see if it fixes it for Albany.

ikalash commented 5 years ago

@jjellio , ah OK I see. I can see if it fixed the issue on my machine, one of the nightly testing platforms, today. I'll keep you posted. We should know tomorrow the situation with all the other nightlies.

ikalash commented 5 years ago

I just pulled Trilinos, rebuilt it, then tried building Albany on top of it. Unfortunately there's still a missing include issue:

                 from /home/ikalash/nightlyCDash/repos/Albany/src/LCM/solvers/Schwarz_Alternating.cpp:12:
/home/ikalash/nightlyAlbanyTests/Results/Trilinos/build/install/include/Xpetra_MatrixFactory.hpp:55:10: fatal error: Xpetra_CrsMatrixWrap.hpp: No such file or directory
 #include "Xpetra_CrsMatrixWrap.hpp"
jhux2 commented 5 years ago

@ikalash This is a new issue, see #5098.

jhux2 commented 5 years ago

PR #5107 should fix this.

csiefer2 commented 5 years ago

@ikalash Looking better yet?

ikalash commented 5 years ago

@csiefer2 : there are still a lot of failures (http://cdash.sandia.gov/CDash-2-3-0/index.php?project=Albany) but not all the builds are failing. I'm hoping the builds that failes (CEE, ride, waterman) failed b/c they started before the change went in. Lets give it one more day - I will check if things are better tomorrow with the Albany dashboard.

ikalash commented 5 years ago

So unfortunately we're still having build issues with Albany on the CEE, waterman, ride and NERSC. It's a link error having to do with MueLu_ParameterListInterpreter.cpp. You can click on any of the red boxes under the Albany builds to see details: http://cdash.sandia.gov/CDash-2-3-0/index.php?project=Albany. I'm not sure why things are fine on my Fedora workstation but not CEE.

We've had failures for more than a week now and we have a big PR that is just about ready to be merged in, but these failures are holding it up. Can someone from the muelu team please resolve the issue? I am happy to provide scripts for the CEE (or any of the other machines where the failures show up) so that you guys can reproduce the issue.

csiefer2 commented 5 years ago

I can make the waterman build of the MueLu tests fail using Albany's Trilinos configure. They seem to fail in similar ways. I'll see what I can do with that.

ikalash commented 5 years ago

Thanks @csiefer2! Please keep me posted.

csiefer2 commented 5 years ago

5140 is the configure time error we discussed.

csiefer2 commented 5 years ago

See #5141 for a related issue.

csiefer2 commented 5 years ago

The cdash server is still down, so all PRs are blocked :(

cgcgcg commented 4 years ago

@ikalash Can this be closed?

ikalash commented 4 years ago

Yes - sorry, I must have forgotten to close this earlier.