Closed bartlettroscoe closed 1 year ago
CC: @mhoemmen
@trilinos/rol, from the build error, it looks like this might be pretty easy to fix. Do you want to fix this or should we just disable the building and running of this test in the ATDM Trilinos builds?
Please let me know what you want to do about this.
FYI: Once we get this build cleaned up, all of that ATDM Trilinos builds will have Tpetra_INST_INT_INT=OFF
going forward. (In fact, Tpetra will not even support instantiating both long long int
and int
global ordinals in the future.)
@bartlettroscoe This looks like a ROL bug, no? It appears that ROL was incorrectly using GlobalOrdinal as the template argument of Teuchos::Comm
.
No response in 12 days so we will disable the build and running of this test going forward in all ATDM Trilinos builds.
@bartlettroscoe I'm working on this and a few other bug fixes right now. Two PRs are coming, of which the second will address this issue.
@dridzal said:
I'm working on this and a few other bug fixes right now. Two PRs are coming, of which the second will address this issue.
That is fine. The PR posted later can re-enable this test.
Is there a summary of all disabled tests in special builds (ATDM and other) over the last few years? It would be good to start re-enabling some of them.
@dridzal asked:
Is there a summary of all disabled tests in special builds (ATDM and other) over the last few years? It would be good to start re-enabling some of them.
Also, all of the GitHub issues are kept open with the Disabled Tests
label set. So for ROL:
Not sure what more we can do. We can tolerate test runtime failures failing for years at a time but we can't tolerate build errors (that would require a lot more complex analysis methods analyzing CDash results).
As shown in the build Trilinos-atdm-cee-rhel6_clang-5.0.1_openmpi-1.10.2_serial_static_opt_no-global-int today, no more build errors and this test is no longer not run.
Adding the label "Disabled Tests" and leaving open as per policy.
@mhoemmen , what's the reasoning behind hard-typing Teuchos::Comm to int? Job security for year 2030? Joking aside, it seems like another user-specified type would be necessary for full flexibility.
@dridzal Neither Ross nor myself wrote those classes. One of Mike Heroux's students / postdocs wrote them circa 2004. Code has a way of sticking around.
Search Trilinos GitHub issues for an explanation of the likely intent. The MPI standard is evolving to support 64-bit message counts, but ranks will likely remain int
. This makes Comm's template parameter not useful.
but ranks will likely remain
int
@mhoemmen , thanks for the explanation; just wanted to make sure that this is the consensus, and that we won't have to change the interface anytime soon. My next PR will address the issue.
@dridzal wrote:
... just wanted to make sure that this is the consensus, ...
That's up to the MPI Committee. It's likely that message sizes won't be coupled to the maximum number of process ranks.
... and that we won't have to change the interface anytime soon.
Teuchos has no owner. It is common property of all its customers, including Trilinos, ROL, Dakota, and other applications and libraries. I don't own Teuchos::Comm
any more than any other developer.
Some Trilinos developers have proposed getting rid of Comm's template parameter. That has advantages, but would break backwards compatibility. ROL, as one of the many customers of Teuchos, can express its views about that.
I would be in favor of removing the template parameter, assuming we can assess and control the impact of this change on dependent code. ROL's use of Teuchos::Comm
is frequent yet basic and transparent, and so any potential changes would be quick. Other codes may not have this flexibility.
@dpkouri , it turns out that this issue is not in the test mentioned above, but in the source code. Both TeuchosBatchManager
and TpetraTeuchosBatchManager
use an Ordinal template parameter for Teuchos::Comm
. As it turns out, only int
is supported, so we have to use Teuchos::Comm<int>
. The question is if Ordinal is used as a template parameter for other objects; if it is not, we should probably consider removing it. However, I suspect that it is used beyond Teuchos::Comm
in these classes. Please advise.
@dpkouri , and as it turns out, there are several other instances related to BatchManager
. Just run the command
grep "Teuchos::Comm<" * -R | grep -v int
in the rol directory. This returns all use cases of Teuchos::Comm
that do not include int
on the line.
@mhoemmen @bartlettroscoe : if you run
grep "Teuchos::Comm<" * -R | grep -v int
(see my previous comment) in the Trilinos/packages directory, you'll see a bunch of uses of Teuchos::Comm
with non-int
template parameters. Having said that, it is possible or even likely that some of those parameters always end up an int
, but it may be worth checking with package developers. Note that the above command may only return a partial list.
@dridzal I'm pretty sure that all uses of Comm<T>
with T != int
are in Thyra or depend on Thyra. Tpetra and packages that depend on it exclusively use Comm<int>
. Thus, if you want to harmonize, you could start with Thyra and its dependencies.
This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity.
If you would like to keep this issue open please add a comment and/or remove the MARKED_FOR_CLOSURE
label.
If this issue should be kept open even with no activity beyond the time limits you can add the label DO_NOT_AUTOCLOSE
.
If it is ok for this issue to be closed, feel free to go ahead and close it. Please do not add any comments or change any labels or otherwise touch this issue unless your intention is to reset the inactivity counter for an additional year.
begone autobot
This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity.
If you would like to keep this issue open please add a comment and/or remove the MARKED_FOR_CLOSURE
label.
If this issue should be kept open even with no activity beyond the time limits you can add the label DO_NOT_AUTOCLOSE
.
If it is ok for this issue to be closed, feel free to go ahead and close it. Please do not add any comments or change any labels or otherwise touch this issue unless your intention is to reset the inactivity counter for an additional year.
This issue was closed due to inactivity for 395 days.
CC: @trilinos/rol, @rppawlo (Trilinos Nonlinear Solvers Product Lead), @bartlettroscoe, @fryeguy52