Open mmatena opened 3 years ago
Done on RoBERTa-large, trained with batch size of 8 for 200k examples. Scores for best checkpoint. Single run.
λ L2 | cola | mnli | mrpc | qnli | qqp | rte | sst2 | stsb |
---|---|---|---|---|---|---|---|---|
0.0 | 65.7 | 87.5 | 90.7 | 91.9 | 86.9 | 83.8 | 95.3 | 90.8 |
0.0003 | 65.6 | 87.5 | 90.1 | 92.2 | 87.3 | 83.4 | 96.0 | 90.3 |
0.01 | 0.0 | 87.4 | 91.4 | 91.9 | 75.6 | 86.6 | 95.5 | 90.6 |
λ L2 | cola | mnli | mrpc | qnli | qqp | rte | sst2 | stsb | Average |
---|---|---|---|---|---|---|---|---|---|
0.0 | 100.1 | 100.6 | 101.0 | 100.8 | 100.4 | 102.6 | 101.1 | 100.0 | 100.8 |
0.0003 | 99.6 | 100.9 | 100.7 | 100.5 | 100.2 | 102.6 | 100.4 | 99.9 | 100.6 |
0.01 | N/A | 100.5 | 100.0 | 100.6 | 100.3 | 102.1 | 100.8 | 100.0 | 100.6 |
Preliminary experiment to get an idea for the range of regularization strengths needed for EWC.
0.0 | 1e-05 | 0.0001 | 0.001 | 0.01 | 0.1 | 1.0 | 10.0 | 100.0 | |
---|---|---|---|---|---|---|---|---|---|
train | 96.6 | 96.3 | 96.5 | 96.6 | 96.5 | 96.3 | 96.4 | 95.3 | 92.2 |
dev | 92.9 | 92.8 | 92.8 | 92.9 | 92.4 | 92.9 | 92.5 | 92.1 | 91.7 |
cola | mnli | mrpc | qnli | qqp | rte | sst2 | stsb | Average | |
---|---|---|---|---|---|---|---|---|---|
None | 57.9 | 83.8 | 83.5 | 90.6 | 89.7 | 66.1 | 92.7 | 86.0 | 81.3 |
0.0003 | 59.4 | 83.8 | 84.9 | 90.6 | 90.4 | 63.5 | 92.3 | 83.9 | 81.1 |
0.01 | 59.5 | 81.6 | 83.0 | 90.0 | 85.7 | 60.6 | 92.0 | 85.3 | 79.7 |
0.1 | 53.6 | 75.4 | 83.3 | 86.0 | 81.3 | 61.0 | 91.7 | 82.8 | 76.9 |
cola | mnli | mrpc | qnli | qqp | rte | sst2 | stsb | Average | |
---|---|---|---|---|---|---|---|---|---|
None | 57.9 | 83.8 | 83.5 | 90.6 | 89.7 | 66.1 | 92.7 | 86.0 | 81.3 |
0.03 | 58.4 | 83.9 | 82.2 | 90.6 | 90.2 | 63.5 | 92.9 | 85.4 | 80.9 |
1.0 | 57.4 | 83.8 | 83.2 | 90.6 | 89.6 | 61.7 | 92.3 | 83.8 | 80.3 |
10.0 | 58.5 | 82.4 | 86.4 | 90.3 | 87.2 | 63.2 | 92.3 | 84.0 | 80.5 |
100.0 | 57.6 | 76.9 | 83.3 | 87.5 | 83.4 | 61.0 | 91.5 | 83.2 | 78.1 |
iso | cola | mnli | mrpc | qnli | qqp | rte | sst2 | stsb | Average |
---|---|---|---|---|---|---|---|---|---|
0.0 | 59.6 | 84.2 | 83.3 | 90.9 | 89.5 | 66.4 | 93.0 | 85.6 | 81.6 |
0.0003 | 59.3 | 84.9 | N/A | 89.5 | 89.1 | 70.8 | 92.1 | 81.3 | N/A |
0.01 | 59.9 | 81.4 | 83.7 | 89.5 | 85.5 | 69.7 | 92.2 | 84.7 | 80.8 |
0.1 | 53.4 | 75.1 | 83.2 | 85.8 | 81.0 | 63.9 | 91.3 | 81.5 | 76.9 |
iso | cola | mnli | mrpc | qnli | qqp | rte | sst2 | stsb |
---|---|---|---|---|---|---|---|---|
0.0 | 1.7 | 0.4 | -0.2 | 0.3 | -0.2 | 0.4 | 0.3 | -0.4 |
0.0003 | -0.1 | 0.0 | N/A | -1.0 | -1.2 | 7.2 | -0.2 | -2.6 |
0.01 | 0.4 | -0.2 | 0.6 | -0.5 | -0.2 | 9.0 | 0.2 | -0.5 |
0.1 | -0.3 | 0.0 | -0.1 | -0.2 | -0.3 | 2.9 | -0.5 | -1.2 |
ewc | cola | mnli | mrpc | qnli | qqp | rte | sst2 | stsb | Average |
---|---|---|---|---|---|---|---|---|---|
0.0 | 59.6 | 84.2 | 83.3 | 90.9 | 89.5 | 66.4 | 93.0 | 85.6 | 81.6 |
0.03 | 58.3 | 83.8 | 83.9 | 90.9 | 90.3 | 66.1 | 92.2 | 85.6 | 81.4 |
1.0 | 58.0 | 84.0 | 83.8 | 90.4 | 89.5 | 68.6 | 92.9 | 83.5 | 81.3 |
10.0 | 57.9 | 82.7 | 86.8 | 90.1 | 86.9 | 72.2 | 92.3 | 84.2 | 81.6 |
100.0 | 57.4 | 76.8 | 83.7 | 87.0 | 82.9 | 66.4 | 91.3 | 84.3 | 78.7 |
ewc | cola | mnli | mrpc | qnli | qqp | rte | sst2 | stsb |
---|---|---|---|---|---|---|---|---|
0.0 | 1.7 | 0.4 | -0.2 | 0.3 | -0.2 | 0.4 | 0.3 | -0.4 |
0.03 | -0.1 | -0.1 | 1.7 | 0.3 | 0.0 | 2.5 | -0.7 | 0.2 |
1.0 | 0.6 | 0.2 | 0.6 | -0.2 | -0.1 | 6.9 | 0.6 | -0.3 |
10.0 | -0.6 | 0.3 | 0.4 | -0.1 | -0.3 | 9.0 | 0.0 | 0.1 |
100.0 | -0.1 | -0.1 | 0.4 | -0.6 | -0.5 | 5.4 | -0.2 | 1.0 |
Columns are MNLI checkpoint indices. Each index corresponds to training on half an epoch of MNLI.
iso | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | |
---|---|---|---|---|---|---|---|---|---|
0.0 | 66.4 | 66.8 | 66.1 | 66.4 | 66.4 | 66.1 | 66.4 | 66.1 | 66.4 |
0.0003 | 71.1 | 72.2 | 70.8 | 72.6 | 74.0 | 71.8 | 70.8 | 70.8 | 70.8 |
0.01 | 66.8 | 68.6 | 70.0 | 67.9 | 67.9 | 67.9 | 69.7 | 69.7 | 70.4 |
0.1 | 60.6 | 59.9 | 60.3 | 61.4 | 61.4 | 62.1 | 63.9 | 63.9 | 63.9 |
iso | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | |
---|---|---|---|---|---|---|---|---|---|
0.0 | 0.4 | 0.7 | 0.0 | 0.4 | 0.4 | 0.0 | 0.4 | 0.0 | 0.4 |
0.0003 | 7.6 | 8.7 | 7.2 | 9.0 | 10.5 | 8.3 | 7.2 | 7.2 | 7.2 |
0.01 | 6.1 | 7.9 | 9.4 | 7.2 | 7.2 | 7.2 | 9.0 | 9.0 | 9.7 |
0.1 | -0.4 | -1.1 | -0.7 | 0.4 | 0.4 | 1.1 | 2.9 | 2.9 | 2.9 |
ewc | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | |
---|---|---|---|---|---|---|---|---|---|
0.0 | 66.4 | 66.8 | 66.1 | 66.4 | 66.4 | 66.1 | 66.4 | 66.1 | 66.4 |
0.03 | 63.9 | 64.3 | 66.1 | 66.1 | 64.6 | 64.6 | 62.5 | 63.2 | 62.1 |
1.0 | 68.6 | 67.5 | 69.0 | 66.8 | 68.6 | 68.6 | 67.5 | 70.0 | 68.2 |
10.0 | 68.2 | 70.0 | 71.1 | 70.4 | 72.2 | 72.2 | 71.1 | 71.1 | 72.9 |
100.0 | 65.0 | 65.7 | 66.4 | 66.4 | 65.0 | 64.6 | 65.3 | 64.3 | 63.2 |
ewc | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | |
---|---|---|---|---|---|---|---|---|---|
0.0 | 0.4 | 0.7 | 0.0 | 0.4 | 0.4 | 0.0 | 0.4 | 0.0 | 0.4 |
0.03 | 0.4 | 0.7 | 2.5 | 2.5 | 1.1 | 1.1 | -1.1 | -0.4 | -1.4 |
1.0 | 6.9 | 5.8 | 7.2 | 5.1 | 6.9 | 6.9 | 5.8 | 8.3 | 6.5 |
10.0 | 5.1 | 6.9 | 7.9 | 7.2 | 9.0 | 9.0 | 7.9 | 7.9 | 9.7 |
100.0 | 4.0 | 4.7 | 5.4 | 5.4 | 4.0 | 3.6 | 4.3 | 3.2 | 2.2 |
Columns are MNLI checkpoint indices. Each index corresponds to training on half an epoch of MNLI.
iso | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
---|---|---|---|---|---|---|---|---|
0.0 | 83.6 | 83.9 | 83.4 | 83.2 | 83.6 | 83.6 | 83.4 | 83.2 |
0.0003 | 85.3 | 85.5 | 85.1 | 85.1 | 84.9 | 84.7 | 84.5 | 84.7 |
0.01 | 83.1 | 83.1 | 82.9 | 83.1 | 83.1 | 83.3 | 82.7 | 83.1 |
0.1 | 83.5 | 82.9 | 82.9 | 83.8 | 83.2 | 83.3 | 83.2 | 84.0 |
iso | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
---|---|---|---|---|---|---|---|---|
0.0 | 0.1 | 0.4 | -0.1 | -0.3 | 0.1 | 0.0 | -0.1 | -0.4 |
0.0003 | 0.4 | 0.6 | 0.2 | 0.2 | 0.0 | -0.2 | -0.3 | -0.2 |
0.01 | 0.1 | 0.0 | -0.1 | 0.0 | 0.0 | 0.2 | -0.3 | 0.0 |
0.1 | 0.2 | -0.4 | -0.4 | 0.5 | -0.1 | 0.0 | -0.1 | 0.7 |
ewc | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
---|---|---|---|---|---|---|---|---|
0.0 | 83.6 | 83.9 | 83.4 | 83.2 | 83.6 | 83.6 | 83.4 | 83.2 |
0.03 | 83.8 | 83.7 | 83.3 | 83.2 | 83.3 | 83.4 | 83.2 | 83.3 |
1.0 | 84.0 | 84.3 | 83.5 | 83.7 | 83.8 | 83.7 | 83.2 | 83.7 |
10.0 | 87.3 | 87.3 | 86.7 | 87.2 | 86.3 | 87.3 | 87.1 | 87.2 |
100.0 | 82.0 | 82.4 | 82.3 | 83.1 | 83.1 | 83.1 | 83.1 | 83.1 |
ewc | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
---|---|---|---|---|---|---|---|---|
0.0 | 0.1 | 0.4 | -0.1 | -0.3 | 0.1 | 0.0 | -0.1 | -0.4 |
0.03 | 1.6 | 1.4 | 1.1 | 1.0 | 1.0 | 1.2 | 1.0 | 1.0 |
1.0 | 0.8 | 1.2 | 0.3 | 0.5 | 0.6 | 0.5 | 0.0 | 0.6 |
10.0 | 0.9 | 0.9 | 0.3 | 0.9 | -0.1 | 0.9 | 0.7 | 0.9 |
100.0 | -1.3 | -0.9 | -1.0 | -0.2 | -0.2 | -0.2 | -0.2 | -0.2 |
Both runs has isotropic L2 regularization with a strength of 0.0003. Each checkpoint index corresponds to another half epoch of training, which consists of a different number of examples for each task.
v target \ donor > | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
---|---|---|---|---|---|---|---|---|
0 | 69.7 | 70.4 | 73.3 | 69.3 | 73.6 | 68.2 | 71.5 | 70.0 |
1 | 71.1 | 72.6 | 74.7 | 71.5 | 74.4 | 73.3 | 72.2 | 70.0 |
2 | 71.5 | 71.1 | 74.4 | 71.8 | 73.6 | 72.2 | 72.2 | 70.8 |
3 | 69.0 | 69.7 | 70.4 | 72.9 | 73.3 | 71.8 | 69.7 | 69.0 |
4 | 71.1 | 72.2 | 70.8 | 72.6 | 74.0 | 71.8 | 70.8 | 70.8 |
5 | 71.1 | 71.1 | 71.5 | 72.9 | 74.4 | 72.9 | 71.1 | 70.4 |
6 | 70.0 | 70.8 | 70.8 | 72.6 | 74.0 | 72.2 | 71.1 | 70.0 |
7 | 71.1 | 71.8 | 72.2 | 72.9 | 74.4 | 73.6 | 71.8 | 71.1 |
v target \ donor > | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
---|---|---|---|---|---|---|---|---|
0 | 11.9 | 12.6 | 15.5 | 11.6 | 15.9 | 10.5 | 13.7 | 12.3 |
1 | 8.7 | 10.1 | 12.3 | 9.0 | 11.9 | 10.8 | 9.7 | 7.6 |
2 | 11.6 | 11.2 | 14.4 | 11.9 | 13.7 | 12.3 | 12.3 | 10.8 |
3 | 9.7 | 10.5 | 11.2 | 13.7 | 14.1 | 12.6 | 10.5 | 9.7 |
4 | 7.6 | 8.7 | 7.2 | 9.0 | 10.5 | 8.3 | 7.2 | 7.2 |
5 | 11.2 | 11.2 | 11.6 | 13.0 | 14.4 | 13.0 | 11.2 | 10.5 |
6 | 11.6 | 12.3 | 12.3 | 14.1 | 15.5 | 13.7 | 12.6 | 11.6 |
7 | 8.3 | 9.0 | 9.4 | 10.1 | 11.6 | 10.8 | 9.0 | 8.3 |
v target \ donor > | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
---|---|---|---|---|---|---|---|---|
0 | 6.1 | 6.9 | 9.7 | 5.8 | 10.1 | 4.7 | 7.9 | 6.5 |
1 | 7.6 | 9.0 | 11.2 | 7.9 | 10.8 | 9.7 | 8.7 | 6.5 |
2 | 7.9 | 7.6 | 10.8 | 8.3 | 10.1 | 8.7 | 8.7 | 7.2 |
3 | 5.4 | 6.1 | 6.9 | 9.4 | 9.7 | 8.3 | 6.1 | 5.4 |
4 | 7.6 | 8.7 | 7.2 | 9.0 | 10.5 | 8.3 | 7.2 | 7.2 |
5 | 7.6 | 7.6 | 7.9 | 9.4 | 10.8 | 9.4 | 7.6 | 6.9 |
6 | 6.5 | 7.2 | 7.2 | 9.0 | 10.5 | 8.7 | 7.6 | 6.5 |
7 | 7.6 | 8.3 | 8.7 | 9.4 | 10.8 | 10.1 | 8.3 | 7.6 |
iso | mrpc | rte |
---|---|---|
0.0 | 83.1 | 64.6 |
0.0003 | 84.7 | 70.8 |
0.01 | 82.9 | 70.8 |
0.1 | 82.9 | 68.6 |
iso | mrpc | rte |
---|---|---|
0.0 | -0.4 | -1.4 |
0.0003 | -0.2 | 7.2 |
0.01 | -0.1 | 10.1 |
0.1 | -0.4 | 7.6 |
ewc | mrpc | rte |
---|---|---|
0.0 | 83.1 | 64.6 |
0.03 | 83.3 | 61.7 |
1.0 | 82.7 | 67.1 |
10.0 | 85.1 | 72.6 |
100.0 | 82.2 | 66.4 |
ewc | mrpc | rte |
---|---|---|
0.0 | -0.4 | -1.4 |
0.03 | 1.0 | -1.8 |
1.0 | -0.4 | 5.4 |
10.0 | -1.3 | 9.4 |
100.0 | -1.1 | 5.4 |
The merged model for this pair of checkpoints had a score of 70.8 on RTE.
Examples | 4096 | 4096 | 4096 | 4096 | 4096 | 4096 |
---|---|---|---|---|---|---|
Beta | 1e-08 | 1e-08 | 1e-07 | 1e-07 | 1e-06 | 1e-06 |
Fisher epoch ⇩\ LR ⇨ | 0.001 | 0.01 | 0.001 | 0.01 | 0.001 | 0.01 |
0 | 72.2 | 71.8 | 72.2 | 72.9 | 72.2 | 71.5 |
1 | 72.2 | 72.2 | 72.2 | 73.3 | 72.2 | 72.6 |
2 | 72.9 | 72.2 | 71.5 | 72.2 | 72.2 | 72.6 |
3 | 72.9 | 72.2 | 71.8 | 71.5 | 72.2 | 72.6 |
4 | 72.9 | 72.2 | 72.2 | 72.6 | 72.2 | 72.2 |
5 | 72.9 | 72.2 | 71.8 | 71.8 | 71.5 | 71.8 |
6 | 72.9 | 72.6 | 72.2 | 71.5 | 71.8 | 71.8 |
7 | 72.9 | 72.2 | 72.2 | 71.5 | 72.2 | 71.5 |
8 | 72.9 | 72.6 | 72.2 | 72.2 | 72.2 | 71.1 |
9 | 72.9 | 72.6 | 72.6 | 72.2 | 72.2 | 71.1 |
10 | 72.9 | 72.6 | 72.9 | 72.2 | 72.2 | 70.8 |
11 | 72.9 | 72.2 | 72.6 | 71.8 | 72.2 | 70.8 |
12 | 72.9 | 72.9 | 72.6 | 71.8 | 71.8 | 71.1 |
13 | 72.6 | 72.6 | 72.2 | 72.2 | 71.8 | 71.5 |
14 | 72.2 | 72.2 | 71.8 | 72.2 | 71.8 | 71.8 |
15 | 72.2 | 72.6 | 72.2 | 71.8 | 71.8 | 71.8 |
Examples | 4096 | 4096 | 4096 | 4096 | 4096 | 4096 |
---|---|---|---|---|---|---|
Beta | 1e-08 | 1e-08 | 1e-07 | 1e-07 | 1e-06 | 1e-06 |
Fisher epoch ⇩\ LR ⇨ | 0.001 | 0.01 | 0.001 | 0.01 | 0.001 | 0.01 |
0 | 1.4 | 1.0 | 1.4 | 2.1 | 1.4 | 0.7 |
1 | 1.4 | 1.4 | 1.4 | 2.5 | 1.4 | 1.8 |
2 | 2.1 | 1.4 | 0.7 | 1.4 | 1.4 | 1.8 |
3 | 2.1 | 1.4 | 1.0 | 0.7 | 1.4 | 1.8 |
4 | 2.1 | 1.4 | 1.4 | 1.8 | 1.4 | 1.4 |
5 | 2.1 | 1.4 | 1.0 | 1.0 | 0.7 | 1.0 |
6 | 2.1 | 1.8 | 1.4 | 0.7 | 1.0 | 1.0 |
7 | 2.1 | 1.4 | 1.4 | 0.7 | 1.4 | 0.7 |
8 | 2.1 | 1.8 | 1.4 | 1.4 | 1.4 | 0.3 |
9 | 2.1 | 1.8 | 1.8 | 1.4 | 1.4 | 0.3 |
10 | 2.1 | 1.8 | 2.1 | 1.4 | 1.4 | -0.0 |
11 | 2.1 | 1.4 | 1.8 | 1.0 | 1.4 | -0.0 |
12 | 2.1 | 2.1 | 1.8 | 1.0 | 1.0 | 0.3 |
13 | 1.8 | 1.8 | 1.4 | 1.4 | 1.0 | 0.7 |
14 | 1.4 | 1.4 | 1.0 | 1.4 | 1.0 | 1.0 |
15 | 1.4 | 1.8 | 1.4 | 1.0 | 1.0 | 1.0 |
Examples | 4096 | 4096 | 4096 | 4096 | 4096 | 4096 | 32768 | 32768 | 32768 | 32768 | 32768 | 32768 | 262144 | 262144 | 262144 | 262144 | 262144 | 262144 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Beta | 1e-08 | 1e-08 | 1e-07 | 1e-07 | 1e-06 | 1e-06 | 1e-08 | 1e-08 | 1e-07 | 1e-07 | 1e-06 | 1e-06 | 1e-08 | 1e-08 | 1e-07 | 1e-07 | 1e-06 | 1e-06 |
LR | 0.001 | 0.01 | 0.001 | 0.01 | 0.001 | 0.01 | 0.001 | 0.01 | 0.001 | 0.01 | 0.001 | 0.01 | 0.001 | 0.01 | 0.001 | 0.01 | 0.001 | 0.01 |
0 | 72.2 | 71.8 | 72.2 | 72.9 | 72.2 | 71.5 | 72.9 | 72.2 | 52.7 | 72.2 | 71.5 | 72.2 | 72.2 | 73.3 | 72.2 | 71.8 | ||
1 | 72.2 | 72.2 | 72.2 | 73.3 | 72.2 | 72.6 | 72.6 | 72.2 | 52.7 | 72.2 | 72.6 | 72.2 | 72.6 | 72.6 | 72.2 | 72.6 | ||
2 | 72.9 | 72.2 | 71.5 | 72.2 | 72.2 | 72.6 | 73.3 | 71.5 | 52.7 | 72.2 | 72.6 | 72.6 | 72.6 | 72.2 | 72.2 | 72.6 | ||
3 | 72.9 | 72.2 | 71.8 | 71.5 | 72.2 | 72.6 | 72.9 | 71.8 | 52.7 | 71.8 | 72.6 | 72.9 | 72.9 | 72.6 | 71.8 | 72.6 | ||
4 | 72.9 | 72.2 | 72.2 | 72.6 | 72.2 | 72.2 | 73.3 | 72.2 | 52.7 | 72.2 | 71.8 | 72.9 | 72.9 | 72.2 | 71.8 | 72.2 | ||
5 | 72.9 | 72.2 | 71.8 | 71.8 | 71.5 | 71.8 | 73.3 | 71.8 | 52.7 | 71.5 | 71.8 | 72.6 | 72.2 | 72.6 | 71.5 | 71.8 | ||
6 | 72.9 | 72.6 | 72.2 | 71.5 | 71.8 | 71.8 | 72.6 | 72.2 | 52.7 | 71.8 | 71.8 | 72.6 | 72.6 | 71.8 | 71.8 | 71.8 | ||
7 | 72.9 | 72.2 | 72.2 | 71.5 | 72.2 | 71.5 | 72.2 | 72.2 | 52.7 | 72.2 | 71.5 | 72.6 | 72.6 | 71.8 | 72.2 | 71.5 | ||
8 | 72.9 | 72.6 | 72.2 | 72.2 | 72.2 | 71.1 | 71.8 | 72.6 | 52.7 | 72.2 | 71.5 | 72.6 | 72.6 | 72.2 | 72.2 | 71.5 | ||
9 | 72.9 | 72.6 | 72.6 | 72.2 | 72.2 | 71.1 | 71.8 | 72.6 | 52.7 | 72.2 | 71.1 | 72.6 | 72.9 | 72.6 | 72.2 | 71.1 | ||
10 | 72.9 | 72.6 | 72.9 | 72.2 | 72.2 | 70.8 | 72.2 | 72.6 | 52.7 | 72.6 | 71.1 | 72.9 | 73.3 | 72.6 | 72.2 | 70.8 | ||
11 | 72.9 | 72.2 | 72.6 | 71.8 | 72.2 | 70.8 | 72.6 | 72.6 | 52.7 | 72.6 | 71.1 | 72.6 | 72.9 | 71.8 | 72.2 | 70.8 | ||
12 | 72.9 | 72.9 | 72.6 | 71.8 | 71.8 | 71.1 | 72.6 | 72.6 | 52.7 | 71.8 | 71.5 | 72.6 | 73.6 | 71.8 | 72.2 | 71.1 | ||
13 | 72.6 | 72.6 | 72.2 | 72.2 | 71.8 | 71.5 | 72.6 | 72.2 | 52.7 | 72.2 | 71.5 | 72.6 | 73.6 | 71.8 | 72.6 | 71.5 | ||
14 | 72.2 | 72.2 | 71.8 | 72.2 | 71.8 | 71.8 | 72.2 | 71.8 | 52.7 | 72.2 | 71.8 | 72.6 | 73.6 | 72.2 | 72.2 | 71.8 | ||
15 | 72.2 | 72.6 | 72.2 | 71.8 | 71.8 | 71.8 | 71.8 | 72.2 | 52.7 | 71.8 | 71.8 | 72.2 | 73.3 | 72.2 | 72.2 | 71.8 |
Examples | 8192 | 8192 | 8192 | 8192 | 8192 | 8192 |
---|---|---|---|---|---|---|
Beta | 1e-10 | 1e-10 | 1e-10 | 1e-09 | 1e-09 | 1e-09 |
LR | 1e-05 | 0.0001 | 0.001 | 1e-05 | 0.0001 | 0.001 |
0 | 71.1 | 71.1 | 72.2 | 71.1 | 71.5 | 72.2 |
1 | 71.1 | 72.2 | 72.6 | 71.1 | 72.2 | 72.6 |
Examples | 8192 | 8192 | 8192 | 8192 | 8192 | 8192 |
---|---|---|---|---|---|---|
Beta | 1e-10 | 1e-10 | 1e-10 | 1e-09 | 1e-09 | 1e-09 |
LR | 1e-05 | 0.0001 | 0.001 | 1e-05 | 0.0001 | 0.001 |
0 | 0.3 | 0.3 | 1.4 | 0.3 | 0.7 | 1.4 |
1 | 0.3 | 1.4 | 1.8 | 0.3 | 1.4 | 1.8 |
The idea is that I'll create a comment with a description and results from each experiment. I can move experiments into their own issue if so desired and replace their comment here with a link to the new issue.