Previously we have been writing regression tests. This has been very problematic because we have not been able to get reliable tests across operating systems.
For example, one test gave three different results on different OS:
In this particular case, we think that it is due to different numeric precisions required in LBFGS that are different in different OS (for torch, at least; R is no issues in this way).
So instead of expecting the same results for the same code, we'll test to make sure that the models are learning from the data. This also prepares us for upcoming GPU support where the standard for reproducibility has no bottom.
The PR will revisit the entire test suite and convert the errors to use cli.
This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
Previously we have been writing regression tests. This has been very problematic because we have not been able to get reliable tests across operating systems.
For example, one test gave three different results on different OS:
In this particular case, we think that it is due to different numeric precisions required in LBFGS that are different in different OS (for torch, at least; R is no issues in this way).
So instead of expecting the same results for the same code, we'll test to make sure that the models are learning from the data. This also prepares us for upcoming GPU support where the standard for reproducibility has no bottom.
The PR will revisit the entire test suite and convert the errors to use cli.