tidymodels / brulee

High-Level Modeling Functions with 'torch'
https://brulee.tidymodels.org/
Other
67 stars 7 forks source link

Redo unit tests #75

Closed topepo closed 1 year ago

topepo commented 1 year ago

Previously we have been writing regression tests. This has been very problematic because we have not been able to get reliable tests across operating systems.

For example, one test gave three different results on different OS:

ubuntu

                             [,1]       [,2]       [,3]        [,4]
model.0.weight[1, ]    0.35076502 -0.5510707 0.33145362  0.07187607
model.0.weight[2, ]    0.84151959  0.1904376 0.34637195 -0.41729349
model.0.weight[3, ]   -0.08732598  0.8749689 0.07960998 -0.36410567
macos

                            [,1]        [,2]      [,3]        [,4]
model.0.weight[1, ]   0.72598004 -0.81716537 0.3312859  0.05504321
model.0.weight[2, ]   1.31857872 -0.01325749 0.3686642 -0.47075731
model.0.weight[3, ]   0.56407475  0.78450465 0.2276767 -0.42647159
windows

                          [,1]        [,2]      [,3]        [,4]
model.0.weight[1, ] 0.42260757 -0.62561375 0.3315659  0.06827193
model.0.weight[2, ] 0.95905924  0.15253183 0.3473763 -0.42539096
model.0.weight[3, ] 0.03839011  0.85049051 0.1365590 -0.38458425

In this particular case, we think that it is due to different numeric precisions required in LBFGS that are different in different OS (for torch, at least; R is no issues in this way).

So instead of expecting the same results for the same code, we'll test to make sure that the models are learning from the data. This also prepares us for upcoming GPU support where the standard for reproducibility has no bottom.

The PR will revisit the entire test suite and convert the errors to use cli.

github-actions[bot] commented 11 months ago

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.