Closed rikhuijzer closed 12 months ago
First run (effaaf6); all on Julia 1 (v1.9.3):
49×7 DataFrame
Row │ Dataset Model Hyperparameters measure score 1.96*SE nfolds
│ String String String String String String Int64
─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ haberman DecisionTreeClassifier (;) auc 0.54 0.05 10
2 │ haberman LogisticClassifier (;) auc 0.69 0.06 10
3 │ haberman XGBoostClassifier (;) auc 0.65 0.04 10
4 │ haberman XGBoostClassifier (max_depth = 2,) auc 0.63 0.04 10
5 │ haberman StableForestClassifier (max_depth = 2,) auc 0.71 0.05 10
6 │ haberman StableRulesClassifier (max_depth = 2, max_rules = 30) auc 0.70 0.08 10
7 │ haberman StableRulesClassifier (max_depth = 2, max_rules = 10) auc 0.67 0.07 10
8 │ titanic DecisionTreeClassifier (;) auc 0.76 0.05 10
9 │ titanic LogisticClassifier (;) auc 0.84 0.02 10
10 │ titanic XGBoostClassifier (;) auc 0.86 0.03 10
11 │ titanic XGBoostClassifier (max_depth = 2,) auc 0.87 0.02 10
12 │ titanic StableForestClassifier (max_depth = 2,) auc 0.85 0.02 10
13 │ titanic StableRulesClassifier (max_depth = 2, max_rules = 30) auc 0.83 0.02 10
14 │ titanic StableRulesClassifier (max_depth = 2, max_rules = 10) auc 0.82 0.02 10
15 │ cancer DecisionTreeClassifier (;) auc 0.92 0.03 10
16 │ cancer MultinomialClassifier (;) auc 0.98 0.01 10
17 │ cancer XGBoostClassifier (;) auc 0.99 0.01 10
18 │ cancer XGBoostClassifier (max_depth = 2,) auc 0.99 0.01 10
19 │ cancer StableForestClassifier (max_depth = 2,) auc 0.98 0.01 10
20 │ cancer StableRulesClassifier (max_depth = 2, max_rules = 30) auc 0.98 0.01 10
21 │ cancer StableRulesClassifier (max_depth = 2, max_rules = 10) auc 0.98 0.01 10
22 │ diabetes DecisionTreeClassifier (;) auc 0.67 0.05 10
23 │ diabetes LogisticClassifier (;) auc 0.70 0.06 10
24 │ diabetes XGBoostClassifier (;) auc 0.80 0.03 10
25 │ diabetes XGBoostClassifier (max_depth = 2,) auc 0.83 0.03 10
26 │ diabetes StableForestClassifier (max_depth = 2,) auc 0.82 0.03 10
27 │ diabetes StableRulesClassifier (max_depth = 2, max_rules = 30) auc 0.77 0.03 10
28 │ diabetes StableRulesClassifier (max_depth = 2, max_rules = 10) auc 0.75 0.05 10
29 │ iris DecisionTreeClassifier (;) accuracy 0.95 0.03 10
30 │ iris MultinomialClassifier (;) accuracy 0.97 0.03 10
31 │ iris XGBoostClassifier (;) accuracy 0.95 0.04 10
32 │ iris XGBoostClassifier (max_depth = 2,) accuracy 0.94 0.04 10
33 │ iris StableForestClassifier (max_depth = 2,) accuracy 0.95 0.04 10
34 │ iris StableRulesClassifier (max_depth = 2, max_rules = 30) accuracy 0.74 0.14 10
35 │ iris StableRulesClassifier (max_depth = 2, max_rules = 10) accuracy 0.71 0.08 10
36 │ boston DecisionTreeRegressor (;) R² 0.74 0.11 10
37 │ boston LinearRegressor (;) R² 0.70 0.05 10
38 │ boston XGBoostRegressor (;) R² 0.88 0.06 10
39 │ boston XGBoostRegressor (max_depth = 2,) R² 0.87 0.04 10
40 │ boston StableForestRegressor (max_depth = 2,) R² 0.67 0.08 10
41 │ boston StableRulesRegressor (max_depth = 2, max_rules = 30) R² 0.52 0.07 10
42 │ boston StableRulesRegressor (max_depth = 2, max_rules = 10) R² 0.63 0.10 10
43 │ make_regression DecisionTreeRegressor (;) R² 0.90 0.02 10
44 │ make_regression LinearRegressor (;) R² 1.00 0.00 10
45 │ make_regression XGBoostRegressor (;) R² 0.98 0.01 10
46 │ make_regression XGBoostRegressor (max_depth = 2,) R² 0.98 0.00 10
47 │ make_regression StableForestRegressor (max_depth = 2,) R² 0.67 0.05 10
48 │ make_regression StableRulesRegressor (max_depth = 2, max_rules = 30) R² 0.48 0.05 10
49 │ make_regression StableRulesRegressor (max_depth = 2, max_rules = 10) R² 0.53 0.06 10
second run (87bc933):
49×7 DataFrame
Row │ Dataset Model Hyperparameters measure score 1.96*SE nfolds
│ String String String String String String Int64
─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ haberman DecisionTreeClassifier (;) auc 0.55 0.06 10
2 │ haberman LogisticClassifier (;) auc 0.69 0.06 10
3 │ haberman XGBoostClassifier (;) auc 0.65 0.04 10
4 │ haberman XGBoostClassifier (max_depth = 2,) auc 0.63 0.04 10
5 │ haberman StableForestClassifier (max_depth = 2,) auc 0.71 0.05 10
6 │ haberman StableRulesClassifier (max_depth = 2, max_rules = 30) auc 0.70 0.08 10
7 │ haberman StableRulesClassifier (max_depth = 2, max_rules = 10) auc 0.67 0.07 10
8 │ titanic DecisionTreeClassifier (;) auc 0.76 0.05 10
9 │ titanic LogisticClassifier (;) auc 0.84 0.02 10
10 │ titanic XGBoostClassifier (;) auc 0.86 0.03 10
11 │ titanic XGBoostClassifier (max_depth = 2,) auc 0.87 0.02 10
12 │ titanic StableForestClassifier (max_depth = 2,) auc 0.85 0.02 10
13 │ titanic StableRulesClassifier (max_depth = 2, max_rules = 30) auc 0.83 0.02 10
14 │ titanic StableRulesClassifier (max_depth = 2, max_rules = 10) auc 0.82 0.02 10
15 │ cancer DecisionTreeClassifier (;) auc 0.92 0.03 10
16 │ cancer MultinomialClassifier (;) auc 0.98 0.01 10
17 │ cancer XGBoostClassifier (;) auc 0.99 0.01 10
18 │ cancer XGBoostClassifier (max_depth = 2,) auc 0.99 0.01 10
19 │ cancer StableForestClassifier (max_depth = 2,) auc 0.98 0.01 10
20 │ cancer StableRulesClassifier (max_depth = 2, max_rules = 30) auc 0.98 0.01 10
21 │ cancer StableRulesClassifier (max_depth = 2, max_rules = 10) auc 0.98 0.01 10
22 │ diabetes DecisionTreeClassifier (;) auc 0.67 0.05 10
23 │ diabetes LogisticClassifier (;) auc 0.70 0.06 10
24 │ diabetes XGBoostClassifier (;) auc 0.80 0.03 10
25 │ diabetes XGBoostClassifier (max_depth = 2,) auc 0.83 0.03 10
26 │ diabetes StableForestClassifier (max_depth = 2,) auc 0.82 0.03 10
27 │ diabetes StableRulesClassifier (max_depth = 2, max_rules = 30) auc 0.77 0.03 10
28 │ diabetes StableRulesClassifier (max_depth = 2, max_rules = 10) auc 0.75 0.05 10
29 │ iris DecisionTreeClassifier (;) accuracy 0.95 0.03 10
30 │ iris MultinomialClassifier (;) accuracy 0.97 0.03 10
31 │ iris XGBoostClassifier (;) accuracy 0.95 0.04 10
32 │ iris XGBoostClassifier (max_depth = 2,) accuracy 0.94 0.04 10
33 │ iris StableForestClassifier (max_depth = 2,) accuracy 0.95 0.04 10
34 │ iris StableRulesClassifier (max_depth = 2, max_rules = 30) accuracy 0.76 0.14 10
35 │ iris StableRulesClassifier (max_depth = 2, max_rules = 10) accuracy 0.67 0.08 10
36 │ boston DecisionTreeRegressor (;) R² 0.74 0.11 10
37 │ boston LinearRegressor (;) R² 0.70 0.05 10
38 │ boston XGBoostRegressor (;) R² 0.88 0.06 10
39 │ boston XGBoostRegressor (max_depth = 2,) R² 0.87 0.04 10
40 │ boston StableForestRegressor (max_depth = 2,) R² 0.67 0.08 10
41 │ boston StableRulesRegressor (max_depth = 2, max_rules = 30) R² 0.52 0.07 10
42 │ boston StableRulesRegressor (max_depth = 2, max_rules = 10) R² 0.63 0.10 10
43 │ make_regression DecisionTreeRegressor (;) R² 0.90 0.02 10
44 │ make_regression LinearRegressor (;) R² 1.00 0.00 10
45 │ make_regression XGBoostRegressor (;) R² 0.98 0.01 10
46 │ make_regression XGBoostRegressor (max_depth = 2,) R² 0.98 0.00 10
47 │ make_regression StableForestRegressor (max_depth = 2,) R² 0.67 0.05 10
48 │ make_regression StableRulesRegressor (max_depth = 2, max_rules = 30) R² 0.48 0.05 10
49 │ make_regression StableRulesRegressor (max_depth = 2, max_rules = 10) R² 0.53 0.06 10
Diff:
5c5
< 1 │ haberman DecisionTreeClassifier (;) auc 0.54 0.05 10
---
> 1 │ haberman DecisionTreeClassifier (;) auc 0.55 0.06 10
38,39c38,39
< 34 │ iris StableRulesClassifier (max_depth = 2, max_rules = 30) accuracy 0.74 0.14 10
< 35 │ iris StableRulesClassifier (max_depth = 2, max_rules = 10) accuracy 0.71 0.08 10
---
> 34 │ iris StableRulesClassifier (max_depth = 2, max_rules = 30) accuracy 0.76 0.14 10
> 35 │ iris StableRulesClassifier (max_depth = 2, max_rules = 10) accuracy 0.67 0.08 10
So the only different runs are the DecisionTree for the Haberman dataset and the StableRulesClassifier for the Iris dataset. These two were also the only differences in two CI runs on main against https://github.com/rikhuijzer/SIRUS.jl/commit/81533786b01c7f6b0cb562c3dda32b9e2156f767.
First (a148080):
49×7 DataFrame
Row │ Dataset Model Hyperparameters measure score 1.96*SE nfolds
│ String String String String String String Int64
─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ haberman DecisionTreeClassifier (;) auc 0.56 0.06 10
2 │ haberman LogisticClassifier (;) auc 0.69 0.06 10
3 │ haberman XGBoostClassifier (;) auc 0.65 0.04 10
4 │ haberman XGBoostClassifier (max_depth = 2,) auc 0.63 0.04 10
5 │ haberman StableForestClassifier (max_depth = 2,) auc 0.70 0.05 10
6 │ haberman StableRulesClassifier (max_depth = 2, max_rules = 30) auc 0.70 0.07 10
7 │ haberman StableRulesClassifier (max_depth = 2, max_rules = 10) auc 0.67 0.06 10
8 │ titanic DecisionTreeClassifier (;) auc 0.76 0.05 10
9 │ titanic LogisticClassifier (;) auc 0.84 0.02 10
10 │ titanic XGBoostClassifier (;) auc 0.86 0.03 10
11 │ titanic XGBoostClassifier (max_depth = 2,) auc 0.87 0.02 10
12 │ titanic StableForestClassifier (max_depth = 2,) auc 0.85 0.02 10
13 │ titanic StableRulesClassifier (max_depth = 2, max_rules = 30) auc 0.83 0.02 10
14 │ titanic StableRulesClassifier (max_depth = 2, max_rules = 10) auc 0.83 0.02 10
15 │ cancer DecisionTreeClassifier (;) auc 0.92 0.03 10
16 │ cancer MultinomialClassifier (;) auc 0.98 0.01 10
17 │ cancer XGBoostClassifier (;) auc 0.99 0.01 10
18 │ cancer XGBoostClassifier (max_depth = 2,) auc 0.99 0.01 10
19 │ cancer StableForestClassifier (max_depth = 2,) auc 0.99 0.01 10
20 │ cancer StableRulesClassifier (max_depth = 2, max_rules = 30) auc 0.98 0.01 10
21 │ cancer StableRulesClassifier (max_depth = 2, max_rules = 10) auc 0.98 0.01 10
22 │ diabetes DecisionTreeClassifier (;) auc 0.67 0.05 10
23 │ diabetes LogisticClassifier (;) auc 0.70 0.06 10
24 │ diabetes XGBoostClassifier (;) auc 0.80 0.03 10
25 │ diabetes XGBoostClassifier (max_depth = 2,) auc 0.83 0.03 10
26 │ diabetes StableForestClassifier (max_depth = 2,) auc 0.82 0.03 10
27 │ diabetes StableRulesClassifier (max_depth = 2, max_rules = 30) auc 0.78 0.04 10
28 │ diabetes StableRulesClassifier (max_depth = 2, max_rules = 10) auc 0.75 0.05 10
29 │ iris DecisionTreeClassifier (;) accuracy 0.95 0.03 10
30 │ iris MultinomialClassifier (;) accuracy 0.97 0.03 10
31 │ iris XGBoostClassifier (;) accuracy 0.95 0.04 10
32 │ iris XGBoostClassifier (max_depth = 2,) accuracy 0.94 0.04 10
33 │ iris StableForestClassifier (max_depth = 2,) accuracy 0.95 0.04 10
34 │ iris StableRulesClassifier (max_depth = 2, max_rules = 30) accuracy 0.87 0.12 10
35 │ iris StableRulesClassifier (max_depth = 2, max_rules = 10) accuracy 0.75 0.07 10
36 │ boston DecisionTreeRegressor (;) R² 0.74 0.11 10
37 │ boston LinearRegressor (;) R² 0.70 0.05 10
38 │ boston XGBoostRegressor (;) R² 0.88 0.06 10
39 │ boston XGBoostRegressor (max_depth = 2,) R² 0.87 0.04 10
40 │ boston StableForestRegressor (max_depth = 2,) R² 0.67 0.09 10
41 │ boston StableRulesRegressor (max_depth = 2, max_rules = 30) R² 0.57 0.08 10
42 │ boston StableRulesRegressor (max_depth = 2, max_rules = 10) R² 0.61 0.09 10
43 │ make_regression DecisionTreeRegressor (;) R² 0.90 0.02 10
44 │ make_regression LinearRegressor (;) R² 1.00 0.00 10
45 │ make_regression XGBoostRegressor (;) R² 0.98 0.01 10
46 │ make_regression XGBoostRegressor (max_depth = 2,) R² 0.98 0.00 10
47 │ make_regression StableForestRegressor (max_depth = 2,) R² 0.68 0.05 10
48 │ make_regression StableRulesRegressor (max_depth = 2, max_rules = 30) R² 0.46 0.05 10
49 │ make_regression StableRulesRegressor (max_depth = 2, max_rules = 10) R² 0.53 0.05 10
second:
49×7 DataFrame
Row │ Dataset Model Hyperparameters measure score 1.96*SE nfolds
│ String String String String String String Int64
─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ haberman DecisionTreeClassifier (;) auc 0.55 0.05 10
2 │ haberman LogisticClassifier (;) auc 0.69 0.06 10
3 │ haberman XGBoostClassifier (;) auc 0.65 0.04 10
4 │ haberman XGBoostClassifier (max_depth = 2,) auc 0.63 0.04 10
5 │ haberman StableForestClassifier (max_depth = 2,) auc 0.70 0.05 10
6 │ haberman StableRulesClassifier (max_depth = 2, max_rules = 30) auc 0.70 0.07 10
7 │ haberman StableRulesClassifier (max_depth = 2, max_rules = 10) auc 0.67 0.06 10
8 │ titanic DecisionTreeClassifier (;) auc 0.76 0.05 10
9 │ titanic LogisticClassifier (;) auc 0.84 0.02 10
10 │ titanic XGBoostClassifier (;) auc 0.86 0.03 10
11 │ titanic XGBoostClassifier (max_depth = 2,) auc 0.87 0.02 10
12 │ titanic StableForestClassifier (max_depth = 2,) auc 0.85 0.02 10
13 │ titanic StableRulesClassifier (max_depth = 2, max_rules = 30) auc 0.83 0.02 10
14 │ titanic StableRulesClassifier (max_depth = 2, max_rules = 10) auc 0.83 0.02 10
15 │ cancer DecisionTreeClassifier (;) auc 0.92 0.03 10
16 │ cancer MultinomialClassifier (;) auc 0.98 0.01 10
17 │ cancer XGBoostClassifier (;) auc 0.99 0.01 10
18 │ cancer XGBoostClassifier (max_depth = 2,) auc 0.99 0.01 10
19 │ cancer StableForestClassifier (max_depth = 2,) auc 0.99 0.01 10
20 │ cancer StableRulesClassifier (max_depth = 2, max_rules = 30) auc 0.98 0.01 10
21 │ cancer StableRulesClassifier (max_depth = 2, max_rules = 10) auc 0.98 0.01 10
22 │ diabetes DecisionTreeClassifier (;) auc 0.67 0.05 10
23 │ diabetes LogisticClassifier (;) auc 0.70 0.06 10
24 │ diabetes XGBoostClassifier (;) auc 0.80 0.03 10
25 │ diabetes XGBoostClassifier (max_depth = 2,) auc 0.83 0.03 10
26 │ diabetes StableForestClassifier (max_depth = 2,) auc 0.82 0.03 10
27 │ diabetes StableRulesClassifier (max_depth = 2, max_rules = 30) auc 0.78 0.04 10
28 │ diabetes StableRulesClassifier (max_depth = 2, max_rules = 10) auc 0.75 0.05 10
29 │ iris DecisionTreeClassifier (;) accuracy 0.95 0.03 10
30 │ iris MultinomialClassifier (;) accuracy 0.97 0.03 10
31 │ iris XGBoostClassifier (;) accuracy 0.95 0.04 10
32 │ iris XGBoostClassifier (max_depth = 2,) accuracy 0.94 0.04 10
33 │ iris StableForestClassifier (max_depth = 2,) accuracy 0.95 0.04 10
34 │ iris StableRulesClassifier (max_depth = 2, max_rules = 30) accuracy 0.75 0.15 10
35 │ iris StableRulesClassifier (max_depth = 2, max_rules = 10) accuracy 0.70 0.08 10
36 │ boston DecisionTreeRegressor (;) R² 0.74 0.11 10
37 │ boston LinearRegressor (;) R² 0.70 0.05 10
38 │ boston XGBoostRegressor (;) R² 0.88 0.06 10
39 │ boston XGBoostRegressor (max_depth = 2,) R² 0.87 0.04 10
40 │ boston StableForestRegressor (max_depth = 2,) R² 0.67 0.09 10
41 │ boston StableRulesRegressor (max_depth = 2, max_rules = 30) R² 0.57 0.08 10
42 │ boston StableRulesRegressor (max_depth = 2, max_rules = 10) R² 0.61 0.09 10
43 │ make_regression DecisionTreeRegressor (;) R² 0.90 0.02 10
44 │ make_regression LinearRegressor (;) R² 1.00 0.00 10
45 │ make_regression XGBoostRegressor (;) R² 0.98 0.01 10
46 │ make_regression XGBoostRegressor (max_depth = 2,) R² 0.98 0.00 10
47 │ make_regression StableForestRegressor (max_depth = 2,) R² 0.68 0.05 10
48 │ make_regression StableRulesRegressor (max_depth = 2, max_rules = 30) R² 0.46 0.05 10
49 │ make_regression StableRulesRegressor (max_depth = 2, max_rules = 10) R² 0.53 0.05 10
Diff:
5c5
< 1 │ haberman DecisionTreeClassifier (;) auc 0.56 0.06 10
---
> 1 │ haberman DecisionTreeClassifier (;) auc 0.55 0.05 10
38,39c38,39
< 34 │ iris StableRulesClassifier (max_depth = 2, max_rules = 30) accuracy 0.87 0.12 10
< 35 │ iris StableRulesClassifier (max_depth = 2, max_rules = 10) accuracy 0.75 0.07 10
---
> 34 │ iris StableRulesClassifier (max_depth = 2, max_rules = 30) accuracy 0.75 0.15 10
> 35 │ iris StableRulesClassifier (max_depth = 2, max_rules = 10) accuracy 0.70 0.08 10
The same problem occurs on Julia 1.6. It looks like none of the versions are stable.
Removed SIMD since it might cause different results on different systems. Still no fix though:
5c5
< 1 │ haberman DecisionTreeClassifier (;) auc 0.53 0.06 10
---
> 1 │ haberman DecisionTreeClassifier (;) auc 0.54 0.06 10
38,39c38,39
< 34 │ iris StableRulesClassifier (max_depth = 2, max_rules = 30) accuracy 0.80 0.10 10
< 35 │ iris StableRulesClassifier (max_depth = 2, max_rules = 10) accuracy 0.77 0.07 10
---
> 34 │ iris StableRulesClassifier (max_depth = 2, max_rules = 30) accuracy 0.76 0.13 10
> 35 │ iris StableRulesClassifier (max_depth = 2, max_rules = 10) accuracy 0.67 0.07 10
46c46
< 42 │ boston StableRulesRegressor (max_depth = 2, max_rules = 10) R² 0.62 0.09 10
---
> 42 │ boston StableRulesRegressor (max_depth = 2, max_rules = 10) R² 0.61 0.09 10
It could even be that SIMD makes things more deterministic because SIMD behavior is at least guaranteed across systems.
I'm just gonna put some obvious performance issues back and merge this afterwards.
In an attempt to solve https://github.com/rikhuijzer/SIRUS.jl/issues/48, I've set more RNGs in https://github.com/rikhuijzer/SIRUS.jl/commit/81533786b01c7f6b0cb562c3dda32b9e2156f767. However, after doing two CI runs, the difference in outcomes are as follows:
Even worse, these results are again different when running locally. Locally, doing multiple runs always produces the same result, so it looks like different systems can give different results.