lattice / quda

QUDA is a library for performing calculations in lattice QCD on GPUs.
https://lattice.github.io/quda
Other
279 stars 94 forks source link

Automated regression testing #165

Open maddyscientist opened 9 years ago

maddyscientist commented 9 years ago

It's time we put together automated nightly regression testing for QUDA. Doing so is critical to the health of QUDA and sanity of the developers. This issue is to track this. We should automate the building of QUDA across a few different configure options (single, MPI and QMP) and run all unit tests across a wide spectrum of tests.

Does anyone have any suggestions on this? This could be done with some scripting, though I'm happy to hear any suggestions.

maddyscientist commented 9 years ago

We should be able to parametrize which branches are targeted for this. By default of course master should be tested, but we may want to temporarily add development branches as they become mature in preparation for merging into master.

Moreover, we'd want to test different GPU architecture targets, since of late bugs have been introduced that only show up on older models of GPUs that are not typically developed on.

mathiaswagner commented 9 years ago

Yes!!! That is something that was in my head during my evening run yesterday. I guess it all comes down to doing http://en.wikipedia.org/wiki/Continuous_integration with physicists ;-)

We probably need to do a lot of rewriting of the tests and more of them. What comes to my mind:

There are servers that can be setup to do this an I know that Xcode comes with this, see: https://www.apple.com/support/osxserver/xcodeserver/ but there are definitely solutions for Linux. A Mac is probably not the testing environment for this.

maddyscientist commented 9 years ago

Multi-gpu testing at it's most basic level can be done on 2 GPUs (or even 1 GPU with loop back). It would certainly give a lot better than zero coverage.

In the short term, I could use my workstation at Caltech for initial automated testing. This has two Keplers in it, so would give some level of coverage.

We absolutely should have a test suite that should be run and passed before merging. To begin with this can be a simple script that runs the unit tests and checks for PASS/FAIL. All tests need to have the google test API for this, which we should get done.

mathiaswagner commented 9 years ago

I have not yet read it but this might be interesting: http://www.robinbetz.com/icsews13secse-id18-p-16578-preprint.pdf (FIXED LINK) (Continuous Integration for AMBER)

stevengottlieb commented 9 years ago

I wholeheartedly agree that nightly testing would be great. It would cut down a lot of frustration on the part of people who are trying to run production jobs. I just spoke with Don Holmgren and he indicated that he would be happy to help with testing on the hardware available at Fermilab. We should also check with Chip and he might also be willing to help with JLab hardware.

maddyscientist commented 9 years ago

I had considered asking Don and Chip about this, thanks for making that first step. I'll start a thread up with them about that.

mathiaswagner commented 9 years ago

Did anyone look into this again? I think we should try to get that started when we are at FNAL.

Jenkins looks promising but I don't have a system to give it a try. A virtual machine on my 2010 MacBook will not really manage to compile QUDA and testing Tesla (as in Tesla architecture or SM 1.2) code is also not helpful.

Mathias

On Oct 24, 2014, at 15:59, mikeaclark notifications@github.com<mailto:notifications@github.com> wrote:

I had considered asking Don and Chip about this, thanks for making that first step. I'll start a thread up with them about that.

— Reply to this email directly or view it on GitHubhttps://github.com/lattice/quda/issues/165#issuecomment-60440942.

maddyscientist commented 9 years ago

I started to write some scripts for this which helped identify a lot bugs. I figured we talk about this properly at FNAL and for now focus on getting the MILC fermion force bug fixed first.