pylada / pylada-light

A physics computational framework for python and ipython
GNU General Public License v3.0
38 stars 24 forks source link

Some tests from testsuite failing when run from travis-ci #51

Closed breathe closed 5 years ago

breathe commented 5 years ago

Hi -- I'm trying to build a conda package for pylada with some CI to build the package, run the testsuite, update my anaconda channel with the build artifact for macOS and linux.

Some of the tests in the testsuite fail when run on travis-ci -- the testsuite work's fine when run on my machine locally -- but same set of tests fail on macOS and linux when run in CI. I think that possibly the failure might have something to do with openmp -- but not really sure ... Thought I could ask and see if someone already knows the issue and possibly a fix? These are the tests that fail:

Total Test time (real) = 335.44 sec
The following tests FAILED:
      6 - pylada.crystal.map_sites (Failed)
     61 - pifunctional (Failed)
     62 - pylada.process.call (Failed)
     63 - pylada.process.iterator (Failed)
     64 - pylada.process.program (Failed)
     65 - pylada.process.jobfolder (Failed)
     82 - pylada.notebooks.notebooks (Failed)

I'm attaching the full raw log from one of the travis-ci job's -- the output of ctest -V is near the end of file...

Many thanks for any help!

mdavezac commented 5 years ago
breathe commented 5 years ago

Awesome thanks for the help!!

Is there another way to get the test suite to skip the OPENMP dépendent tests. The travis job is building anaconda Linux and anaconda macOS versions of the package and (ideally?) I would include mpi4py so consumers of the library would have the capability ...?

breathe commented 5 years ago

map_sites worked on repeat run — and fixed the tree issue thanks!! Just need to figure out solution for skipping the openmp dependent tests.

I tried setting an environment variable to tell openmp not to crash when insufficient slots are available — and tried setting OMP_NUM_THREADS to a particular value (no idea what it should be tho ...) — is there reason to think that approach might work ...?

breathe commented 5 years ago

I got the tests to pass on travis by hacking in the --oversubscribe option to mpirun. I tried a bunch of different environment variables to avoid having to pass that option on command line but couldn't find any that work (not really sure why ... -- none of these seemed to affect the tests -- not sure if the environment variables aren't being picked up or if none of these achieve the equivalent of --oversubscribe ...

None-working set of environment variables:

OMPI_MCA_plm=isolated
OMPI_MCA_btl_vader_single_copy_mechanism=none
OMPI_MCA_rmaps_base_oversubscribe=yes
OMP_NUM_THREADS=8
OMPI_MCA_mpi_yield_when_idle=yes

Here's a change that gets the testsuite passing on travis-ci with mpi4py installed: https://github.com/breathe/pylada-light/commit/207861dfc3982def865ebb1bce9fe0617c9dabae

Would a PR for above changes be accepted -- I'd be willing to implement something better but don't have strong sense how a most elegant fix would look ...

breathe commented 5 years ago

Disregard above comment -- I figured out the issue that was preventing the environment variable based openmp configuration from functioning -- I was unintentionally scrubbing the environment variables before running the test suite.

I got the full testsuite to pass in travis-ci (macOS and linux) without having to change anything in pylada-light. It suffices to export these environment variables to ctest:

OMP_NUM_THREADS=4
OMPI_MCA_rmaps_base_oversubscribe=yes

Thanks!