Framework: Allow commits to non-code directories w/o autotesting

csiefer2 commented 6 years ago

I'm thinking specifically of cmake/ctest/drivers, which contains stuff for driving nightly tests and not code that the autotester can actually execute.

jhux2 commented 6 years ago

@trilinos/framework

bartlettroscoe commented 6 years ago

This should be done at the same time as moving to the TriBITS code that automatically knows about mapping of directories to packages.

csiefer2 commented 6 years ago

@bartlettroscoe And the timeframe for that is?

bartlettroscoe commented 6 years ago

And the timeframe for that is?

This is a question for the @trilinos/framework team.

csiefer2 commented 6 years ago

@trilinos/framework This is starting to get really annoying. We've been looking at 8+ hour delays on the autotester so far this week. Can you bump up the priority of the TriBITS upgrades (or throw more resources at the autotester)?

jmgate commented 6 years ago

Roughly how many PRs are we seeing per day?

csiefer2 commented 6 years ago

Yesterday we had 7 new ones and the autotester is still working through the backlog from yesterday...

csiefer2 commented 6 years ago

... and 3 old ones that left WIP

jmgate commented 6 years ago

Is "we" all of Trilinos, or Tpetra, MueLu?

csiefer2 commented 6 years ago

Trilinos as a whole. It would be simple to scrape the PR page to do a graph of PR's merged per day. That might be a good thing for the framework team to track (if you do need more resources, that'll be how you justify it).

bartlettroscoe commented 6 years ago

As a data-point, a simple re-enable of some Anasazi tests for some ATDM Trilinos builds in PR #2621 which was created at 3:59 PM MT did not have the testing completed until 11:07 PM MT. That is not quite 8 hours but it is close to it.

I worry about the added time that an Intel 17 build will add to the auto PR tester. The Intel 17.0.1 build and tests takes almost 4 hours to complete as mentioned at https://github.com/trilinos/Trilinos/issues/2463#issuecomment-383946447 and that is on 16 cores of a very fast machine.

csiefer2 commented 6 years ago

I'm still waiting for tests of something I submitted yesterday at COB to start. Eight hours was being conservative :P

bartlettroscoe commented 6 years ago

I think a better statistic is the number of PR builds started in the last 24 hours shown at:

https://testing-vm.sandia.gov/cdash/index.php?project=Trilinos&filtercombine=and&filtercombine=and&filtercombine=and&filtercombine=and&filtercount=3&showfilters=1&filtercombine=and&field1=groupname&compare1=61&value1=Pull%20Request&field2=buildstarttime&compare2=84&value2=now&field3=buildstarttime&compare3=83&value3=24%20hours%20ago

That shows 24 PR builds (one GCC 4.8.4 and one GCC 4.9.3) so that means 12 iterations of the auto PR tester started.

The Intel 17.0.1 build will be on Jenkins machines and will therefore complete with these existing builds.

The CUDA build will run on 'white' and will not compete with these builds so that should not be a problem.

bartlettroscoe commented 6 years ago

I'm still waiting for tests of something I submitted yesterday at COB to start. Eight hours was being conservative :P

@csiefer2 is it not starting because of some reason other than a backup of the auto PR tester? See:

csiefer2 commented 6 years ago

@bartlettroscoe Nope. Just backlog.

allevin commented 6 years ago

@csiefer2 I investigated why your job did not start. It seems a number of Jenkins jobs that run the autotester are disabled. The @trilinos/framework team is investigating.

csiefer2 commented 6 years ago

Jenkins messed stuff up? How am I not surprised...

csiefer2 commented 6 years ago

@trilinos/framework Do you guys have stats the mean/std deviation of the number of testable PR's per day?

How many sigmas before we can expect the autotester to be overwhelmed (if Jenkins is working)?

prwolfe commented 6 years ago

@csiefer2 Honestly, we have yet to hit the expected limit. Jim and I did some work profiling recently and it looks like we have some additional capacity (which is why I am working on the intel Pr build.)

As for what would be the limit, it depends on the rate and timing, so I can't really give you an answer other than that we will be monitoring this.

prwolfe commented 6 years ago

While it seems easy, determining if a given arbitrary file should trigger a rebuild is not trivial. In fact I found a few shell scripts designed to modify the checkout and in one case the code. I will talk with Aaron and Jim about the specific directory mentioned, but there is no policy or technical control to keep the scripts there from changing in nature.

I would favor eliminating some code areas/directories by policy, but not attempting to understand the implications of the given commit as that can be a significant undertaking. The directories chosen should have some marker (README or other note) indicating to the development team that changes at this level will not be PR tested.

At a glance I see a few things that are candidates to not trigger testing

sampleScripts docs

Stuff I'm less sure about

commonTools cmake/* - several of these do impact stuff, but you claim at least one does not. demos - I'm pretty sure we don't build this even if we should.

csiefer2 commented 6 years ago

cmake/ctest/drivers only impact testing...

csiefer2 commented 6 years ago

@prwolfe Could we allow individual packages to specify their own "non-code" directories? This has to be agglomerated somewhere, of course...

prwolfe commented 6 years ago

Not sure - let me do some digging.

jwillenbring commented 6 years ago

@csiefer2 's build sat queued too long due to a mistake on my part. I will avoid similar mistakes in the future.

As @prwolfe noted, our current testing resources are sufficient for processing the pull requests we need to process. We took steps to decreases the chances of pull requests having to wait behind other pull requests, and will make an even more aggressive change by the end of tomorrow. This will create a small risk of over running our resources, but that risk is quite small and a suitable message will be added as a comment to the PR if we would run out of resources.

We will see how much this improves the delays we are seeing and then prioritize improvements in logic to decide how much of Trilinos needs to build accordingly.

jwillenbring commented 6 years ago

This should be done at the same time as moving to the TriBITS code that automatically knows about mapping of directories to packages.

@bartlettroscoe how would the TriBITS capability work for builds like the CUDA one where we want to consider enabling only about 1/2 of Trilinos packages?

bartlettroscoe commented 6 years ago

@bartlettroscoe how would the TriBITS capability work for builds like the CUDA one where we want to consider enabling only about 1/2 of Trilinos packages?

We would just put in forced disables of non ATDM packages. That is basically adding a new cache var called something like ATDM_FORCE_DISABLE_DISABLED_PACKAGES and a one-line change right at:

https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/ATDMDisables.cmake#L81

What that would do is if ROL was changed in the PR, for example, and you passed in -DTrilinos_ENABLE_ROL=ON, then it would turn off ROL inside of the cmake configure of Trilinos and you would get zero tests defined. But the auto tester can already handle zero tests? It think it is that easy.

Let's talk at our regular meeting time on Monday about getting the current ATDM CUDA build on 'white' up as a PR build with just the 25 package that are currently being tested. That has been kept clean and it caught a defect in SEACAS that all of the SIERRA testing that was done was missed. If this CUDA build a (with tests) would have been in auto PR testing, then that SEACAS update would have never gotten into the 'develop' branch and we would have saved people bunch of time due to this update (see #2650).

jwillenbring commented 6 years ago

This has been working for some time. I am closing this ticket.

jhux2 commented 6 years ago

Could we allow individual packages to specify their own "non-code" directories? This has to be agglomerated somewhere, of course...

@trilinos/framework Was this considered? Just curious.

bartlettroscoe commented 6 years ago

Could we allow individual packages to specify their own "non-code" directories? This has to be agglomerated somewhere, of course...

@trilinos/framework Was this considered? Just curious.

@jhux2, I discussed this with @william76 a few months ago as part of #3133. Basically we could extend TriBITS to support this.

trilinos / Trilinos

Framework: Allow commits to non-code directories w/o autotesting #2594