jdtournier / testing

The old MRtrix3 testing repo, now obsolete - do not use!
0 stars 0 forks source link

dirgen - might need to reimplement multiple restarts to avoid test failures... #3

Open jdtournier opened 9 years ago

jdtournier commented 9 years ago

Dirgen seems to get stuck into local minima from time to time, particularly for the 45 directions case. Might be best to implement a multiple restart capability into - would avoid having to re-run tests every time it got stuck...

Or just reduce the tolerance more? It's in the right ballpark, just clearly not the most optimal solution...

thijsdhollander commented 9 years ago

For these probabilistic (i.e., randomly initialised) algos that are not doing any multi threaded work (dirgen doesn't, I assume?), could they not be initialised with a fixed seed or something (just for testing purposes, I mean)?

The multiple restart capability is a useful feature on its own, I reckon; but strictly speaking will not guarantee that this test failing issue will never happen (although exponentially reducing chances). Even increasing tolerance might be a tricky endeavour: how much is sufficient on one hand, and how much is too much (and undoes the usefulness of the testing) on the other?

Overall, in principle, tests should be guaranteed to succeed if the outcome of even one run results in a plausible behaviour. The more you go into randomised areas, that does become trickier indeed though...

Another idea that comes to mind as I'm typing this: is there a way (and is it preferable to begin with) to have multiple sets of tests? We could have one that doesn't include the "probabilistic" tests (where the positive outcome is not always guaranteed to begin with), and use that one for the restricted branches mechanism; and then another one that does include these tests, just for our own insights (as was the case up to now for the tests overall). I'm not sure myself I like the idea, but it's one that might potentially solve the problem at hand with the restricted branches..? :confused:

I mean, even without the branch restrictions being based on the tests, it's still not preferable that we have a badge on our front page that ever once in a while (probabilistically) will show "build failed", even though we inherently know "that's fine".

jdtournier commented 8 years ago

All good points. What I've done for the moment is to relax the tests for dirgen a little to allow them to pass (at least more often). Note that it's still in the right ballpark, so this should still catch anything that causes outright failures, which is in practice what's most likely to happen. Subtle regressions would be harder to catch, but hopefully harder to introduce in the first place... (?)

In general, it would indeed probably be better to use a fixed RNG seed. The issue right now is that the approaches are so different on both branches that this would never lead to the same results anyway. This might also be an issue with tckgen, where the approaches are just different enough to cause differences (I think there were more RNG instances per thread in the previous version) - I'll have to check whether this is indeed true, haven't looked into it for a while... In any case, the point is, once we merge to master, we can switch back to a more stringent testing regime.

On a more general note, I'm not all that convinced that a very stringent test with a fixed RNG seed is necessarily desirable. Sure, it'll catch any change in the algorithm, but what if we modify it to improve performance or something so that the approach is different, leading to totally different output with the same RNG seed, even though the quality of the results is just as good? It really depends on what we intend these tests to catch: any changes at all (probably a good idea if we intend master to be rock-steady), or regressions in actual performance. The issue is that currently the testing repo is independent of the core repo, so needs to be applicable across all branches that we may want to merge down the track. This is already an issue with the current divergence between master and updated_syntax, but will be a recurrent problem as new branches are proposed, some of which might be destined to be part of a future major versions, so might be quite different in implementation and/or syntax... Not really sure how to handle that other than having the testing repo be part of the core repo, but I don't like the idea of making the core repo unnecessarily large (and it will be with all the test data). Maybe we can use branches on testing as well, although that might prove irritating to manage... Any thoughts welcome.