Closed csiefer2 closed 6 years ago
@trilinos/framework
This should be done at the same time as moving to the TriBITS code that automatically knows about mapping of directories to packages.
@bartlettroscoe And the timeframe for that is?
And the timeframe for that is?
This is a question for the @trilinos/framework team.
@trilinos/framework This is starting to get really annoying. We've been looking at 8+ hour delays on the autotester so far this week. Can you bump up the priority of the TriBITS upgrades (or throw more resources at the autotester)?
Roughly how many PRs are we seeing per day?
Yesterday we had 7 new ones and the autotester is still working through the backlog from yesterday...
... and 3 old ones that left WIP
Is "we" all of Trilinos, or Tpetra, MueLu?
Trilinos as a whole. It would be simple to scrape the PR page to do a graph of PR's merged per day. That might be a good thing for the framework team to track (if you do need more resources, that'll be how you justify it).
As a data-point, a simple re-enable of some Anasazi tests for some ATDM Trilinos builds in PR #2621 which was created at 3:59 PM MT did not have the testing completed until 11:07 PM MT. That is not quite 8 hours but it is close to it.
I worry about the added time that an Intel 17 build will add to the auto PR tester. The Intel 17.0.1 build and tests takes almost 4 hours to complete as mentioned at https://github.com/trilinos/Trilinos/issues/2463#issuecomment-383946447 and that is on 16 cores of a very fast machine.
I'm still waiting for tests of something I submitted yesterday at COB to start. Eight hours was being conservative :P
I think a better statistic is the number of PR builds started in the last 24 hours shown at:
That shows 24 PR builds (one GCC 4.8.4 and one GCC 4.9.3) so that means 12 iterations of the auto PR tester started.
The Intel 17.0.1 build will be on Jenkins machines and will therefore complete with these existing builds.
The CUDA build will run on 'white' and will not compete with these builds so that should not be a problem.
I'm still waiting for tests of something I submitted yesterday at COB to start. Eight hours was being conservative :P
@csiefer2 is it not starting because of some reason other than a backup of the auto PR tester? See:
@bartlettroscoe Nope. Just backlog.
@csiefer2 I investigated why your job did not start. It seems a number of Jenkins jobs that run the autotester are disabled. The @trilinos/framework team is investigating.
Jenkins messed stuff up? How am I not surprised...
@trilinos/framework Do you guys have stats the mean/std deviation of the number of testable PR's per day?
How many sigmas before we can expect the autotester to be overwhelmed (if Jenkins is working)?
@csiefer2 Honestly, we have yet to hit the expected limit. Jim and I did some work profiling recently and it looks like we have some additional capacity (which is why I am working on the intel Pr build.)
As for what would be the limit, it depends on the rate and timing, so I can't really give you an answer other than that we will be monitoring this.
While it seems easy, determining if a given arbitrary file should trigger a rebuild is not trivial. In fact I found a few shell scripts designed to modify the checkout and in one case the code. I will talk with Aaron and Jim about the specific directory mentioned, but there is no policy or technical control to keep the scripts there from changing in nature.
I would favor eliminating some code areas/directories by policy, but not attempting to understand the implications of the given commit as that can be a significant undertaking. The directories chosen should have some marker (README or other note) indicating to the development team that changes at this level will not be PR tested.
At a glance I see a few things that are candidates to not trigger testing
sampleScripts docs
Stuff I'm less sure about
commonTools cmake/* - several of these do impact stuff, but you claim at least one does not. demos - I'm pretty sure we don't build this even if we should.
cmake/ctest/drivers only impact testing...
@prwolfe Could we allow individual packages to specify their own "non-code" directories? This has to be agglomerated somewhere, of course...
Not sure - let me do some digging.
@csiefer2 's build sat queued too long due to a mistake on my part. I will avoid similar mistakes in the future.
As @prwolfe noted, our current testing resources are sufficient for processing the pull requests we need to process. We took steps to decreases the chances of pull requests having to wait behind other pull requests, and will make an even more aggressive change by the end of tomorrow. This will create a small risk of over running our resources, but that risk is quite small and a suitable message will be added as a comment to the PR if we would run out of resources.
We will see how much this improves the delays we are seeing and then prioritize improvements in logic to decide how much of Trilinos needs to build accordingly.
This should be done at the same time as moving to the TriBITS code that automatically knows about mapping of directories to packages.
@bartlettroscoe how would the TriBITS capability work for builds like the CUDA one where we want to consider enabling only about 1/2 of Trilinos packages?
@bartlettroscoe how would the TriBITS capability work for builds like the CUDA one where we want to consider enabling only about 1/2 of Trilinos packages?
We would just put in forced disables of non ATDM packages. That is basically adding a new cache var
called something like ATDM_FORCE_DISABLE_DISABLED_PACKAGES
and a one-line change right at:
What that would do is if ROL was changed in the PR, for example, and you passed in -DTrilinos_ENABLE_ROL=ON
, then it would turn off ROL inside of the cmake configure of Trilinos and you would get zero tests defined. But the auto tester can already handle zero tests? It think it is that easy.
Let's talk at our regular meeting time on Monday about getting the current ATDM CUDA build on 'white' up as a PR build with just the 25 package that are currently being tested. That has been kept clean and it caught a defect in SEACAS that all of the SIERRA testing that was done was missed. If this CUDA build a (with tests) would have been in auto PR testing, then that SEACAS update would have never gotten into the 'develop' branch and we would have saved people bunch of time due to this update (see #2650).
This has been working for some time. I am closing this ticket.
Could we allow individual packages to specify their own "non-code" directories? This has to be agglomerated somewhere, of course...
@trilinos/framework Was this considered? Just curious.
Could we allow individual packages to specify their own "non-code" directories? This has to be agglomerated somewhere, of course...
@trilinos/framework Was this considered? Just curious.
@jhux2, I discussed this with @william76 a few months ago as part of #3133. Basically we could extend TriBITS to support this.
I'm thinking specifically of cmake/ctest/drivers, which contains stuff for driving nightly tests and not code that the autotester can actually execute.