trilinos / Trilinos

Primary repository for the Trilinos Project
https://trilinos.org/
Other
1.21k stars 565 forks source link

Framework: No complex or float PR or 'master' merge builds needed to support ASC Sierra Code #10742

Open bartlettroscoe opened 2 years ago

bartlettroscoe commented 2 years ago

@trilinos/framework, @vbrunini, @jclause, @jhux2

Internal Issues:

Description

The ASC Sierra Code requires support for complex<double> and float with the configure arguments:

-D Trilinos_ENABLE_COMPLEX_DOUBLE=ON \
-D Trilinos_ENABLE_FLOAT=ON \

(see #10635 and #10720.) Yet there appear to be no PR or 'master' promotion builds that enable either of these.

The documentation at:

claims there are two PR builds that enable complex:

and two 'master' promotion builds that enable complex:

But those builds don't seem to be running from looking at recent PR testing iterations like https://github.com/trilinos/Trilinos/pull/10720#issuecomment-1179472450.

This is a problem because important customers like Sierra require build with -D Trilinos_ENABLE_COMPLEX_DOUBLE=ON (see #10635).

Also, there does not not appear to even be support for the GenConfig system for setting -DTrilinos_ENABLE_FLOAT=ON which is required by important customers like Sierra. This is (not) shown by:

$ find packages/framework/ -type f -exec grep -nH ENABLE_FLOAT {} \; | wc -l
0

But there are lots of references to ENABLE_COMPLEX_DOUBLE:

$ find packages/framework/ -type f -exec grep -nH ENABLE_COMPLEX_DOUBLE {} \; | wc -l
21
bartlettroscoe commented 2 years ago

I have created the internal issue:

to try to get this on the backlog.

One may argue that Trilinos proper can't support every configuration permutation that is used by customers. That is correct. However, that is not the case here. This is not a permutation problem (which is usually addressed by testing pairs of options.) What is happening here is that there are no automated builds that enable either of these two options!

In my opinion, what should be done is to take the fastest running Trilinos PR build and add:

-D Trilinos_ENABLE_COMPLEX=ON \
-D Trilinos_ENABLE_FLOAT=ON \

That will test all of the major supported scalar types in Trilinos complex<double>, complex<float>, and float. That would not catch the defect reported in #10635 but at least we would be building and running all of this code. As I will report shortly, Trilinos does not even build with those configure options currently.

bartlettroscoe commented 2 years ago

CC: @jennloe

@trilinos/framework, @ccober6, note that the lack of enabling float in any PR builds means that important work like the mixed precision preconditioner work like in #11222 is not being tested.

Just take one of the cheaper PR builds, like the clang or the gnu builds, and set Trilinos_ENABLE_FLOAT=ON. That would be super easy and would not impact the wall clock time of the PR builds at all (because the ats2 build which takes 6 hours is the bottleneck).

cgcgcg commented 2 years ago

I believe the issue with enabling float is that a lot of tests make implicit assumptions about the scalar type being double or complex<double> and set tolerances that do not depend on the type. I'll make some changes to one of MueLu's nightly builds to see how bad it is.

sebrowne commented 1 year ago

We do have complex enabled now in the CUDA11 PR/MM build (but not float, it's double). This likely helps, but doesn't cover the float/complex float types.

jennloe commented 1 year ago

@cgcgcg How did things go with enabling float in the MueLu nightly?

cgcgcg commented 1 year ago

https://testing.sandia.gov/cdash/index.php?project=Trilinos&parentid=11376924 but I'm missing the changes from PR #11244 to allow to test Stratimikos without Epetra.

jennloe commented 1 year ago

@cgcgcg That doesn't look too bad so far! Seems like the failing Tpetra test just needs to adjust the tolerance for float values.

sebrowne commented 1 year ago

I'm going to have the team target rhel7_sems-gnu-8.3.0-openmpi-1.10.1-serial_debug_shared_no-kokkos-arch_no-asan_no-complex_no-fpic_mpi_no-pt_no-rdc_no-uvm_deprecated-on_no-package-enables and switch it to complex float (for now, add a nightly with complex float). We shall see how the results look, and then we can explore changing it in PR testing.

github-actions[bot] commented 6 months ago

This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity. If you would like to keep this issue open please add a comment and/or remove the MARKED_FOR_CLOSURE label. If this issue should be kept open even with no activity beyond the time limits you can add the label DO_NOT_AUTOCLOSE. If it is ok for this issue to be closed, feel free to go ahead and close it. Please do not add any comments or change any labels or otherwise touch this issue unless your intention is to reset the inactivity counter for an additional year.

vbrunini commented 6 months ago

@sebrowne Can you confirm if this got done?

ccober6 commented 6 months ago

@vbrunini I believe it has. Recent build: https://trilinos-cdash.sandia.gov/index.php?project=Trilinos&date=2024-05-07 under gnu-8.3.0-openmpi-1.10.1_debug_shared_complex-float. Passed except for a couple tests in ifpack2 and tpetra.