Closed bartlettroscoe closed 6 years ago
Is there a way to verify that the CDash site attempted to send out CDash error emails? I looked at the CDash log file and I could not find any records of any emails getting sent out.
I didn't see anything useful there either. Here's some other spots to check for clues:
(I don't have read access to these files)
I pinged John P and he replied with:
There were some stuck email messages. I restarted sendmail, and it look like the mail is flowing. Please let me know if the problem comes back.
I then got a huge flood of emails.
@zackgalbreath,
Is there some way to get CDash (perhaps in our upgraded version) to provide a single log entry line for each email perhaps just with the email address the email is sent too and the email summary line. That would tell us a lot I suspect.
Otherwise, I got a flood of back emails so I think this issue is resolved.
My question is, how will we catch issues like this in the future sooner?
Another problem is that from looking at the query:
there where "ATDM" Group has builds with test failures on 2/7/2018 and 2/8/2018 that should have triggered emails but I never got those emails. For example, two tests with "MueLu_ParameterListInterpreter" in the name failed on 2/7/2018 shown at:
that do not appear to have triggered any CDash emails. Even with the flood of back emails coming through today after sendmail was restarted I don't see any CDash emails matching "MueLu_ParameterListInterpreter". The most recent CDash email I can find with that test name in body was form 11/1/2017 shown below.
-----Original Message----- From: CDash [mailto:trilinos-regression@sandia.gov] Sent: Wednesday, November 01, 2017 6:16 AM To: Bartlett, Roscoe A rabartl@sandia.gov Subject: FAILED (t=5): Trilinos/MueLu - Linux-gcc-4.9.3- CONTINUOUS_MPI_OPT_DEV_SHARED - Continuous
A submission to CDash for the project Trilinos has failing tests. You have been identified as one of the authors who have checked in changes that are part of this submission or you are listed in the default contact list.
Details on the submission can be found at https://testing.sandia.gov/cdash/buildSummary.php?buildid=3194412
Project: Trilinos SubProject: MueLu Site: sadl30906.srn.sandia.gov Build Name: Linux-gcc-4.9.3-CONTINUOUS_MPI_OPT_DEV_SHARED Build Time: 2017-11-01T09:39:57 UTC Type: Continuous Tests failing: 5
Tests failing MueLu_UnitTestsEpetra_MPI_1 (https://testing.sandia.gov/cdash/testDetails.php?test=42413720&build=319441 2) MueLu_UnitTestsTpetra_MPI_1 (https://testing.sandia.gov/cdash/testDetails.php?test=42413740&build=319441 2) MueLu_ParameterListInterpreterTpetra_MPI_1 (https://testing.sandia.gov/cdash/testDetails.php?test=42413741&build=319441 2) MueLu_UnitTestsTpetra_MPI_4 (https://testing.sandia.gov/cdash/testDetails.php?test=42413742&build=319441 2) MueLu_Aggregation_MPI_4 (https://testing.sandia.gov/cdash/testDetails.php?test=42413769&build=319441 2)
-CDash on testing.sandia.gov
I guess all that we can do is watch the "ATDM" group every day and watch to see if a new test failure is recorded and if it sends out an email.
FYI: I added issue:
to make sure that future versions of CDash that we use will log CDash emails (at least one short line per email). This will help to make sure that CDash is sending out the emails that it should.
I am now closing this issue.
CC: @trilinos/framework, @zackgalbreath
Description
It looks like the Trilinos CDash site is no longer sending out any CDash error emails. I have not gotten a CDash error email since 1/30/2018. The standard CI build shown at:
has been falling since 20:03 UTC 2/15/2018. Yet, I did not receive any CDash error emails for these failures.
This is also breaking all of the “Clean” builds as shown at:
This likely also explains why the PR testing is failing right now (but we can’t see why).
Is there a way to verify that the CDash site attempted to send out CDash error emails? I looked at the CDash log file and I could not find any records of any emails getting sent out.