ucl-pond / pySuStaIn

Subtype and Stage Inference (SuStaIn) algorithm with an example using simulated data.
MIT License
135 stars 64 forks source link

Adding tests & a few optimizations #19

Closed sea-shunned closed 3 years ago

sea-shunned commented 3 years ago

Tests

For reproducibility, a seed is given as input into SuStaIn, and used throughout so that results are consistent for a given seed (and parameters).

This enables a full functional test of SuStaIn. I've added two scripts to check the output from SuStaIn (in the "tests" subfolder): create_validation.py is for creating new validation benchmarks, and validation.py is for checking results are consistent with these benchmarks.

A full test (-f command line flag for validation.py) uses every class that inherits from AbstractSustain. The relevant test functions are @abstractmethods, so this should scale with future additions/subclasses. This also means that the simulator functions are now a part of the class they pertain to (though I haven't removed the "sim" subfolder for now).

Note that the test for ZscoreSustainMissingData is currently a copy of the ZscoreSustain one, and in the near future will be modified to better test the missing data handling.

Optimizations

Part of the motivation for the tests was to make a couple of optimizations, namely vectorizing a couple of the main loops.

In a test experiment, the running time was reduced in the z-score version from ~15.5 hours to ~10 hours. The speed up is less significant for the mixture model version (~2.1 hours to ~1.68 hours).

Feel free to ask if you want details, but it's a lot to add here and ultimately not interesting.

Other Changes

noxtoby commented 3 years ago

High-level, this all looks great. Anyone wanna check the details?