Use GitHub Actions for testing

This PR enables testing with GitHub Actions. It uses the same test containers and settings as HOOMD:

https://github.com/glotzerlab/hoomd-blue/blob/master/.github/workflows/test.yml

This configuration limits execution to a handful of cases (rather than a full matrix). Some limitations are that we only do double-precision builds, all builds have both MD and MPCD components, and we continue to only test for compilation under CUDA (no execution). The advantages are that we can now uses the glotzerlab Docker containers so that we don't have to build our own, we shorten the number of tests, and we can test multiple versions of HOOMD more easily.

I have removed the code format checker from the unit tests. The script is not so robust, and this would be better handled with another tool like pre-commit.

Todo:

[x] Should we try to use actions/cache to store compiled HOOMD to speedup some builds? This is helpful when making changes to azplugins, as HOOMD does not change. However, it means we have to be careful to invalidate the cache when the build environment changes.
[x] Should we do internal builds, external+internal builds, or external builds only (current setup)?
[x] Remove rest of circleci leftovers
[x] Update required tests
[x] Add some developer documentation on what is tested

Resolves #41 Resolves #54

I think the proposed simplification of testing is appropriate. I would favor to simply do external build tests, but I am biased because that is how I tend to install azplugins.

How big is the potential speedup by using actions/cache? At least in the workflow I normally use for coding, I do execute a sub-set of unit-tests periodically and locally, and really only rely on the build-in testing once I get close to a pull request. So, the speed has not really been a limiting factor for me. If it is difficult to make sure to be careful about invalid caches, to me it is only worth it if the speedup is significant.

I think single precision vs double is something we do not want to address as developers/maintainers, but each user has to think about this carefully themselves. So testing double only is fine in my opinion. We can revisit this when hoomd has a mixed precision mode, because then the potential performance gain on all the ML optimized GPUs might be worth it.

We might want to consider writing a paragraph about how testing works in principle and what is tested/not tested, as well as additional recommended testing in the Developer Guidelines. It could also go onto the to do list #56.

So overall, this is good! In terms of hoomd-versions to test: definitively 2.9.7 as last stable v2, and perhaps 2.6.0 or 2.8.0? I think we do not have to go much older versions.

Awesome, thanks for the feedback!

I think the proposed simplification of testing is appropriate. I would favor to simply do external build tests, but I am biased because that is how I tend to install azplugins.

Sounds good, external only is easy.

How big is the potential speedup by using actions/cache? At least in the workflow I normally use for coding, I do execute a sub-set of unit-tests periodically and locally, and really only rely on the build-in testing once I get close to a pull request. So, the speed has not really been a limiting factor for me. If it is difficult to make sure to be careful about invalid caches, to me it is only worth it if the speedup is significant.

It currently takes about 6 minutes to compile HOOMD, then 1.5 minutes to build and run azplugins. So, it is a large relative amount of time, but in absolute time it is not very much. Considering that we are using GitHub runners, we don't really need to worry about contesting resources, so I think we can just leave it be for now.

I think single precision vs double is something we do not want to address as developers/maintainers, but each user has to think about this carefully themselves. So testing double only is fine in my opinion. We can revisit this when hoomd has a mixed precision mode, because then the potential performance gain on all the ML optimized GPUs might be worth it.

I agree. We might catch some compilation errors with single though, if someone accidentally codes the wrong precision? I think for now, we should not test this if HOOMD also does not test it, and we can address in a future PR if we support mixed precision.

We might want to consider writing a paragraph about how testing works in principle and what is tested/not tested, as well as additional recommended testing in the Developer Guidelines. It could also go onto the to do list #56.

I can add this

So overall, this is good! In terms of hoomd-versions to test: definitively 2.9.7 as last stable v2, and perhaps 2.6.0 or 2.8.0? I think we do not have to go much older versions.

Given that HOOMD v2 is not going to be supported post March, I don't think I want to worry about testing backwards compatibility v2.x. We are going to have to make a clean break to v3, so let's focus our efforts on testing multiple versions of that when they become available.

mphowardlab / azplugins

Use GitHub Actions for testing #57