openprov / interop-test-harness

Interoperability test harness for the Southampton Provenance Suite.
MIT License
1 stars 2 forks source link

Running test cases in parallel #9

Open trungdong opened 9 years ago

trungdong commented 9 years ago

There are over 7,500 tests currently generated and running jobs over those is time consuming, especially with those having to access remote services. Since all tests are independent to one another, parallelising their running will potentially speed up the test jobs significantly.

mikej888 commented 9 years ago

nosetests supports parallel execution of tests when run on a multi-core machine:

I updated prov_interop/interop_tests/test_converter.py to define the _multiprocess_cansplit variable required to use this feature:

class ConverterTestCase(unittest.TestCase):

  _multiprocess_can_split_ = True

Within the test_case method I changed the print statement to also print the process ID:

  def test_case(self, index, ext_in, file_ext_in, ext_out, file_ext_out):
    print(("Test case: " + str(index) + 
          " from " + ext_in + 
          " to " + ext_out + " Process: " + str(os.getpid())))

I changed the test output file names to include the process ID (as otherwise there would be a risk that one process might delete the output file name for another process):

    self.converter_ext_out = "out." + str(os.getpid()) + "." + ext_out

If the tests on a 4 processor machine, and nosetests processes flag is used to request 4 processes:

$ nosetests --processes=4 --nocapture -v prov_interop.interop_tests.test_provtoolbox

then the tests are run in multiple processes:

Test case: case1 from json to json Process: 3978
Test case: case1 from json to provn Process: 3994
Test case: case1 from json to provx Process: 4018
Test case: case1 from json to trig Process: 4049
Test case: case1 from json to ttl Process: 3978
Test case: case1 from provn to json Process: 4049
Test case: case1 from provn to provn Process: 3994
Test case: case1 from provn to provx Process: 4018

Commit: ed8e7b21b84cf32a4e815626548e1618902de351

I updated prov_interop/provstore/converter.py to include a process ID as part of the document when it is placed within ProvStore (again to avoid requests clashing):

    store_request = {ProvStoreConverter.CONTENT: doc, 
                     ProvStoreConverter.PUBLIC: True, 
                     ProvStoreConverter.REC_ID: str(os.getpid()) + "." + in_format}

Commit: 4c0b9d68cbc7e7d4163a5e1f504e73e3ef985f32

I made similar changes to service-tests (in prov_service_tests/test_service.py and prov_service_tests/test_provstore.py). Commit: e53e89bcc3dd69d10dea2628eb996576ea93216f.

mikej888 commented 9 years ago

This would not be of use in Travis. For Travis, their Parallelizing your builds across virtual machines recommends the use of their build matrix.

An example of this was already used for interop-test-runner's .travis.yml to run compnent tests concurrently on Travis's virtual machines.

If the test cases in the testcases repository were grouped into sub-directories, then the interoperability test harness could be run on each sub-directory in parallel. For example, here is a .travis.yml file build matrix that could be used in, for example provpy-interop-job's .travis.yml, that would run tests in parallel for subsets of the test cases, if they were placed into sub-directories:

env: 
  matrix:
    - TESTS=test-cases/test-set1
    - TESTS=test-cases/test-set2
    - TESTS=test-cases/test-set3
    - TESTS=test-cases/test-set4

As an example of this in use, a simplified version of the ProvPy .travis.yml file is:

language: python

python:
  - 2.7
  - 3.4

env: 
  matrix:
    - TESTS=test-cases/test-set1
    - TESTS=test-cases/test-set2
    - TESTS=test-cases/test-set3
    - TESTS=test-cases/test-set4

addons:
  apt:
    packages:
    - zlib1g-dev
    - libxslt1-dev

before_install:
  - echo "Getting test cases..."
  - echo "Getting ProvPy source..."
  - echo "Getting test harness..."
  - echo "Downloading and installing ProvToolbox from provenance.ecs.soton.ac.uk"
  - echo "Creating local configuration..."

script: 
  - echo "Running ProvPy on $TESTS"

When Travis CI runs this it spawns 8 jobs:

2.1  Python: 2.7  TESTS=test-cases/test-set1 
2.2  Python: 2.7  TESTS=test-cases/test-set2  
2.3  Python: 2.7  TESTS=test-cases/test-set3 
2.4  Python: 2.7  TESTS=test-cases/test-set4  
2.5  Python: 3.4  TESTS=test-cases/test-set1  
2.6  Python: 3.4  TESTS=test-cases/test-set2  
2.7  Python: 3.4  TESTS=test-cases/test-set3  
2.8  Python: 3.4  TESTS=test-cases/test-set4

An excerpt from the log file of the first job is:

Getting test cases...
Getting ProvPy source...
Getting test harness...
Downloading and installing ProvToolbox from provenance.ecs.soton.ac.uk
Creating local configuration...
Running ProvPy on test-cases/test-set1
trungdong commented 9 years ago

I haven't had a chance to set up the local Jenskin environment, but your solution seems to be suitable for that. I wlll look into setting this up.

Thank you for investigate this.