Optimize F1 Threshold. - Githubissues

paul-tqh-nguyen commented 4 years ago

This was discovered while working on #19.

We blindly assume that a 0.5 threshold (which we get from torch.round) is a good threshold.

Let's not assume that (since it's quite presumptuous) and see if we can change the threshold to get an optimal result.

The two ideas we had so far:

For each topic, take the mean of the guesses for the intended-to-be-positive results from the training set and the mean for the guesses of the intended-to-be-negative results from the training set. The midpoint between those two would be a good threshold.
- If this bodes well, an easy passive experiment is to pick a standard deviation to ignore certain results when account for the mean so that outliers don't ruin our estimation.
We can keep the 0.5 threshold but force the model to learn to use it by adding a regularization to the cost that makes values close to the threshold give high cost.
- Would trying a soft F1 work here?
- Another idea was to add (1+(raw_result-0.5))**2 or exp(1+(raw_result-0.5)) to the cost. Let's try this idea first if we go this route.

TODO:

[x] Implement idea 1 above using simple arithmetic means.
[ ] Make F1 threshold optimization take a higher batch size.
[ ] If idea 1 works out, see if using a mean that is robust against outliers will lead to further improvement. If it is too slow, see if we can improve things via at least an approximation. Can we improve speed by making the iterator change batch size? Can we keep two iterators around for different batch sizes?
[ ] Investigate idea 2 above.
- We've hypothesized above why it is the case that we're not getting things on the right side of the threshold because of a poor loss function. Perhaps we can improve this by adding in a regularization term that maximizes distance from 0.5? Let's experiment with this. The gradient descent might force some examples that are wrong to just be pushed in the wrong direction further to maximize this distance from 0.5. Perhaps using a soft F1 would help?

paul-tqh-nguyen commented 4 years ago

https://github.com/paul-tqh-nguyen/reuters_topic_labelling/commit/c26143dc862513f961687dc4e040a13d54b2e14b

This commit implements idea 1.

While working on this, we learned one potential reason that fixing the threshold at 0.5 wouldn't be optimal.

Given the cross-entropy loss we use, costs get really bad when we steer really far away from the correct answer. The closer we get to the correct answer, the smaller those costs get. Thus, we'll always be focused on getting the really bad guesses close, not making the okay guesses closer.

How low the costs get doesn't really matter when it comes time to discretize the answer since low costs don't guarantee that the answer exceeds the threshold.

Let's say our model predicts a bunch of values close to 0.5. If there are some egregious predictions that are WAY OFF, then those egregious examples will be focused on by the gradient descent and those losses will decrease dramatically. The close examples won't be focused on by the gradient descent. At the end of the day, neither will be at the correct side of the threshold.

We picked a threshold hold to be the midpoint between the arithmetic mean of the intended-to-be-positive results given by our model and the intended-to-be-negative results given by our model for each topic / class.

This yielded approximately a 0.025 improvement in our F1 score.

We were able to get a better F1 score while having a bigger loss than before. This strongly hints that the problem we posed above was quite problematic.

FUTURE WORK:

We've hypothesized above why it is the case that we're not getting things on the right side of the threshold because of a poor loss function. Perhaps we can improve this by adding in a regularization term that maximizes distance from 0.5? Let's experiment with this. The gradient descent might force some examples that are wrong to just be pushed in the wrong direction further to maximize this distance from 0.5. Perhaps using a soft F1 would help?
Instead of naively using the means of the desired positives and negatives, which is brittle against outliers, can we instead be robust against outliers by accounting for the standard deviation? This will make the computation slower (it is already quite slow). Can we use approximation methods?

Our results are below.

pnguyen@pnguyenmachine:/home/pnguyen/code/reuters_topic_labelling$ ./main.py -train-model Model hyperparameters are: number_of_epochs: 40 batch_size: 1 max_vocab_size: 10000 vocab_size: 10002 pre_trained_embedding_specification: glove.840B.300d encoding_hidden_size: 512 number_of_encoding_layers: 2 attention_intermediate_size: 16 number_of_attention_heads: 32 output_size: 67 dropout_probability: 0.5 output_directory: ./default_output/ The model has 14846859 trainable parameters. This processes's PID is 17415. Starting training Epoch 0 Training F1 0.26466261: 100%|██████████████████████████████████████████████████| 5176/5176 [09:37<00:00, 8.96it/s] Validation F1 0.66872415: 100%|██████████████████████████████████████████████████| 3106/3106 [02:16<00:00, 22.79it/s] No Threshold Optimization Loss: 0.07097643853401489 No Threshold Optimization F1: 0.668724151595669 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 5176/5176 [03:46<00:00, 22.89it/s] Validation F1 0.68061272: 100%|██████████████████████████████████████████████████| 3106/3106 [02:18<00:00, 22.41it/s] With Threshold Optimization Loss: 0.07097643853431473 With Threshold Optimization F1: 0.6806127237586613 Train F1: 0.26466261 | Train Loss: 0.37129248 Val. F1: 0.68061272 | Val. Loss: 0.07097644 Testing F1 0.67876332: 100%|██████████████████████████████████████████████████| 2070/2070 [01:30<00:00, 22.86it/s] No Threshold Optimization Loss: 0.0692332605770329 No Threshold Optimization F1: 0.6787633238351287 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 5176/5176 [03:52<00:00, 22.24it/s] Testing F1 0.69475925: 100%|██████████████████████████████████████████████████| 2070/2070 [01:47<00:00, 19.21it/s] With Threshold Optimization Loss: 0.0692332605770329 With Threshold Optimization F1: 0.694759245166456 Test F1: 0.69475925 | Test Loss: 0.06923326 Execution of "Epoch 0" took 1510.4704444408417 seconds. Epoch 1 Training F1 0.34985344: 100%|██████████████████████████████████████████████████| 5176/5176 [13:31<00:00, 6.38it/s] Validation F1 0.71439315: 100%|██████████████████████████████████████████████████| 3106/3106 [02:12<00:00, 23.39it/s] No Threshold Optimization Loss: 0.06298034499784554 No Threshold Optimization F1: 0.7143931514454595 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 5176/5176 [03:39<00:00, 23.54it/s] Validation F1 0.74362580: 100%|██████████████████████████████████████████████████| 3106/3106 [02:19<00:00, 22.35it/s] With Threshold Optimization Loss: 0.0629803450131377 With Threshold Optimization F1: 0.7436258030883819 Train F1: 0.34985344 | Train Loss: 0.36433776 Val. F1: 0.74362580 | Val. Loss: 0.06298035 Testing F1 0.72370088: 100%|██████████████████████████████████████████████████| 2070/2070 [01:29<00:00, 23.24it/s] No Threshold Optimization Loss: 0.06091143837353638 No Threshold Optimization F1: 0.7237008822856894 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 5176/5176 [03:40<00:00, 23.43it/s] Testing F1 0.75732360: 100%|██████████████████████████████████████████████████| 2070/2070 [01:22<00:00, 25.03it/s] With Threshold Optimization Loss: 0.06091143837353638 With Threshold Optimization F1: 0.757323601446002 Test F1: 0.75732360 | Test Loss: 0.06091144 Execution of "Epoch 1" took 1695.9259984493256 seconds. Epoch 2 Training F1 0.38103630: 100%|██████████████████████████████████████████████████| 5176/5176 [09:16<00:00, 9.31it/s] Validation F1 0.74013948: 100%|██████████████████████████████████████████████████| 3106/3106 [02:21<00:00, 21.98it/s] No Threshold Optimization Loss: 0.04140427108536162 No Threshold Optimization F1: 0.7401394840025242 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 5176/5176 [03:43<00:00, 23.16it/s] Validation F1 0.76633221: 100%|██████████████████████████████████████████████████| 3106/3106 [02:13<00:00, 23.23it/s] With Threshold Optimization Loss: 0.04140427109225808 With Threshold Optimization F1: 0.7663322050563381 Train F1: 0.38103630 | Train Loss: 0.36020578 Val. F1: 0.76633221 | Val. Loss: 0.04140427 Testing F1 0.75857733: 100%|██████████████████████████████████████████████████| 2070/2070 [01:24<00:00, 24.64it/s] No Threshold Optimization Loss: 0.04050565998038055 No Threshold Optimization F1: 0.7585773289419603 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 5176/5176 [03:33<00:00, 24.25it/s] Testing F1 0.77845938: 100%|██████████████████████████████████████████████████| 2070/2070 [01:27<00:00, 23.57it/s] With Threshold Optimization Loss: 0.04050565998038055 With Threshold Optimization F1: 0.7784593769533623 Test F1: 0.77845938 | Test Loss: 0.04050566 Execution of "Epoch 2" took 1440.2498366832733 seconds. Epoch 3 Training F1 0.41380278: 100%|██████████████████████████████████████████████████| 5176/5176 [09:31<00:00, 9.06it/s] Validation F1 0.80561529: 100%|██████████████████████████████████████████████████| 3106/3106 [02:14<00:00, 23.04it/s] No Threshold Optimization Loss: 0.049514004013467186 No Threshold Optimization F1: 0.8056152946263533 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 5176/5176 [03:39<00:00, 23.58it/s] Validation F1 0.81254231: 100%|██████████████████████████████████████████████████| 3106/3106 [02:15<00:00, 22.98it/s] With Threshold Optimization Loss: 0.049514004021263186 With Threshold Optimization F1: 0.812542314864863 Train F1: 0.41380278 | Train Loss: 0.35825424 Val. F1: 0.81254231 | Val. Loss: 0.04951400 Execution of "Epoch 3" took 1060.6898229122162 seconds. Epoch 4 Training F1 0.42137584: 100%|██████████████████████████████████████████████████| 5176/5176 [09:29<00:00, 9.09it/s] Validation F1 0.78234627: 100%|██████████████████████████████████████████████████| 3106/3106 [02:08<00:00, 24.13it/s] No Threshold Optimization Loss: 0.040854758309501193 No Threshold Optimization F1: 0.7823462688723304 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 5176/5176 [03:38<00:00, 23.72it/s] Validation F1 0.79882575: 100%|██████████████████████████████████████████████████| 3106/3106 [02:14<00:00, 23.06it/s] With Threshold Optimization Loss: 0.040854758315123314 With Threshold Optimization F1: 0.7988257525969688 Train F1: 0.42137584 | Train Loss: 0.35651017 Val. F1: 0.79882575 | Val. Loss: 0.04085476 Testing F1 0.79279001: 100%|██████████████████████████████████████████████████| 2070/2070 [01:27<00:00, 23.58it/s] No Threshold Optimization Loss: 0.039824825854846126 No Threshold Optimization F1: 0.792790011226555 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 5176/5176 [03:39<00:00, 23.58it/s] Testing F1 0.80430631: 100%|██████████████████████████████████████████████████| 2070/2070 [01:28<00:00, 23.51it/s] With Threshold Optimization Loss: 0.039824825854846126 With Threshold Optimization F1: 0.8043063104440624 Test F1: 0.80430631 | Test Loss: 0.03982483 Execution of "Epoch 4" took 1446.7144391536713 seconds. Epoch 5 Training F1 0.43624716: 100%|██████████████████████████████████████████████████| 5176/5176 [09:30<00:00, 9.07it/s] Validation F1 0.77159876: 100%|██████████████████████████████████████████████████| 3106/3106 [02:15<00:00, 22.94it/s] No Threshold Optimization Loss: 0.028916027597218823 No Threshold Optimization F1: 0.7715987649536946 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 5176/5176 [03:39<00:00, 23.58it/s] Validation F1 0.79539802: 100%|██████████████████████████████████████████████████| 3106/3106 [02:14<00:00, 23.04it/s] With Threshold Optimization Loss: 0.028916027605295932 With Threshold Optimization F1: 0.7953980196907838 Train F1: 0.43624716 | Train Loss: 0.35458559 Val. F1: 0.79539802 | Val. Loss: 0.02891603 Testing F1 0.78968491: 100%|██████████████████████████████████████████████████| 2070/2070 [01:27<00:00, 23.54it/s] No Threshold Optimization Loss: 0.028882981958593814 No Threshold Optimization F1: 0.7896849062275771 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 5176/5176 [03:39<00:00, 23.58it/s] Testing F1 0.80696572: 100%|██████████████████████████████████████████████████| 2070/2070 [01:27<00:00, 23.55it/s] With Threshold Optimization Loss: 0.028882981958593814 With Threshold Optimization F1: 0.8069657175581236 Test F1: 0.80696572 | Test Loss: 0.02888298 Execution of "Epoch 5" took 1456.2223284244537 seconds. Epoch 6 Training F1 0.44170945: 100%|██████████████████████████████████████████████████| 5176/5176 [09:23<00:00, 9.19it/s] Validation F1 0.81803831: 100%|██████████████████████████████████████████████████| 3106/3106 [02:06<00:00, 24.49it/s] No Threshold Optimization Loss: 0.035330354899659445 No Threshold Optimization F1: 0.8180383097785792 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 5176/5176 [03:26<00:00, 25.10it/s] Validation F1 0.83067719: 100%|██████████████████████████████████████████████████| 3106/3106 [02:06<00:00, 24.50it/s] With Threshold Optimization Loss: 0.03533035490925452 With Threshold Optimization F1: 0.8306771860015983 Train F1: 0.44170945 | Train Loss: 0.35285399 Val. F1: 0.83067719 | Val. Loss: 0.03533035 Execution of "Epoch 6" took 1022.9698314666748 seconds. Epoch 7 Training F1 0.45592826: 100%|██████████████████████████████████████████████████| 5176/5176 [08:57<00:00, 9.63it/s] Validation F1 0.77779127: 100%|██████████████████████████████████████████████████| 3106/3106 [02:00<00:00, 25.81it/s] No Threshold Optimization Loss: 0.02916882032766304 No Threshold Optimization F1: 0.7777912669310781 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 5176/5176 [03:16<00:00, 26.31it/s] Validation F1 0.79258742: 100%|██████████████████████████████████████████████████| 3106/3106 [02:00<00:00, 25.76it/s] With Threshold Optimization Loss: 0.02916882032436473 With Threshold Optimization F1: 0.7925874211963959 Train F1: 0.45592826 | Train Loss: 0.35369414 Val. F1: 0.79258742 | Val. Loss: 0.02916882 Execution of "Epoch 7" took 975.2549004554749 seconds. Epoch 8 Training F1 0.46161796: 100%|██████████████████████████████████████████████████| 5176/5176 [08:56<00:00, 9.64it/s] Validation F1 0.83484681: 100%|██████████████████████████████████████████████████| 3106/3106 [02:00<00:00, 25.78it/s] No Threshold Optimization Loss: 0.028184936014204943 No Threshold Optimization F1: 0.834846814251184 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 5176/5176 [03:24<00:00, 25.35it/s] Validation F1 0.84623644: 100%|██████████████████████████████████████████████████| 3106/3106 [02:06<00:00, 24.52it/s] With Threshold Optimization Loss: 0.028184936014804637 With Threshold Optimization F1: 0.8462364442900773 Train F1: 0.46161796 | Train Loss: 0.35160897 Val. F1: 0.84623644 | Val. Loss: 0.02818494 Testing F1 0.84269134: 100%|██████████████████████████████████████████████████| 2070/2070 [01:22<00:00, 25.08it/s] No Threshold Optimization Loss: 0.027904577491099265 No Threshold Optimization F1: 0.8426913361474512 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 5176/5176 [03:25<00:00, 25.14it/s] Testing F1 0.85076256: 100%|██████████████████████████████████████████████████| 2070/2070 [01:22<00:00, 25.07it/s] With Threshold Optimization Loss: 0.027904577491099265 With Threshold Optimization F1: 0.8507625592863503 Test F1: 0.85076256 | Test Loss: 0.02790458 Execution of "Epoch 8" took 1359.737808227539 seconds. Epoch 9 Training F1 0.46756188: 100%|██████████████████████████████████████████████████| 5176/5176 [08:56<00:00, 9.65it/s] Validation F1 0.80396022: 100%|██████████████████████████████████████████████████| 3106/3106 [02:07<00:00, 24.42it/s] No Threshold Optimization Loss: 0.02634589585023885 No Threshold Optimization F1: 0.8039602186657964 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 5176/5176 [03:26<00:00, 25.04it/s] Validation F1 0.81705792: 100%|██████████████████████████████████████████████████| 3106/3106 [02:06<00:00, 24.49it/s] With Threshold Optimization Loss: 0.026345895855336234 With Threshold Optimization F1: 0.8170579156747804 Train F1: 0.46756188 | Train Loss: 0.35215183 Val. F1: 0.81705792 | Val. Loss: 0.02634590 Testing F1 0.80774202: 100%|██████████████████████████████████████████████████| 2070/2070 [01:22<00:00, 25.04it/s] No Threshold Optimization Loss: 0.026872934983305615 No Threshold Optimization F1: 0.8077420237199696 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 5176/5176 [03:26<00:00, 25.08it/s] Testing F1 0.82039471: 100%|██████████████████████████████████████████████████| 2070/2070 [01:22<00:00, 25.03it/s] With Threshold Optimization Loss: 0.026872934983305615 With Threshold Optimization F1: 0.8203947068340537 Test F1: 0.82039471 | Test Loss: 0.02687293 Execution of "Epoch 9" took 1369.1374626159668 seconds. Epoch 10 Training F1 0.46714601: 100%|██████████████████████████████████████████████████| 5176/5176 [08:58<00:00, 9.61it/s] Validation F1 0.82266056: 100%|██████████████████████████████████████████████████| 3106/3106 [02:06<00:00, 24.52it/s] No Threshold Optimization Loss: 0.028031757405248124 No Threshold Optimization F1: 0.8226605565919849 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 5176/5176 [03:25<00:00, 25.13it/s] Validation F1 0.83112103: 100%|██████████████████████████████████████████████████| 3106/3106 [02:06<00:00, 24.52it/s] With Threshold Optimization Loss: 0.028031757401949815 With Threshold Optimization F1: 0.831121027354494 Train F1: 0.46714601 | Train Loss: 0.35273247 Val. F1: 0.83112103 | Val. Loss: 0.02803176 Execution of "Epoch 10" took 997.752111196518 seconds. Epoch 11 Training F1 0.46579664: 100%|██████████████████████████████████████████████████| 5176/5176 [08:58<00:00, 9.62it/s] Validation F1 0.82208238: 100%|██████████████████████████████████████████████████| 3106/3106 [02:06<00:00, 24.54it/s] No Threshold Optimization Loss: 0.029585550222769402 No Threshold Optimization F1: 0.8220823841158682 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 5176/5176 [03:25<00:00, 25.14it/s] Validation F1 0.83216314: 100%|██████████████████████████████████████████████████| 3106/3106 [02:06<00:00, 24.55it/s] With Threshold Optimization Loss: 0.029585550225093214 With Threshold Optimization F1: 0.8321631447929608 Train F1: 0.46579664 | Train Loss: 0.35163672 Val. F1: 0.83216314 | Val. Loss: 0.02958555 Execution of "Epoch 11" took 997.3074843883514 seconds. Epoch 12 Training F1 0.46589616: 100%|██████████████████████████████████████████████████| 5176/5176 [08:57<00:00, 9.63it/s] Validation F1 0.83800856: 100%|██████████████████████████████████████████████████| 3106/3106 [02:06<00:00, 24.52it/s] No Threshold Optimization Loss: 0.02738756824692864 No Threshold Optimization F1: 0.8380085608591514 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 5176/5176 [03:26<00:00, 25.11it/s] Validation F1 0.84717796: 100%|██████████████████████████████████████████████████| 3106/3106 [02:06<00:00, 24.51it/s] With Threshold Optimization Loss: 0.02738756824572925 With Threshold Optimization F1: 0.8471779634759876 Train F1: 0.46589616 | Train Loss: 0.35053153 Val. F1: 0.84717796 | Val. Loss: 0.02738757 Execution of "Epoch 12" took 997.2744867801666 seconds. Epoch 13 Training F1 0.47642630: 100%|██████████████████████████████████████████████████| 5176/5176 [08:59<00:00, 9.60it/s] Validation F1 0.84079606: 100%|██████████████████████████████████████████████████| 3106/3106 [02:06<00:00, 24.55it/s] No Threshold Optimization Loss: 0.020014988193542856 No Threshold Optimization F1: 0.8407960590179014 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 5176/5176 [03:25<00:00, 25.16it/s] Validation F1 0.85117853: 100%|██████████████████████████████████████████████████| 3106/3106 [02:06<00:00, 24.56it/s] With Threshold Optimization Loss: 0.02001498819316805 With Threshold Optimization F1: 0.8511785292195567 Train F1: 0.47642630 | Train Loss: 0.35086268 Val. F1: 0.85117853 | Val. Loss: 0.02001499 Testing F1 0.84479859: 100%|██████████████████████████████████████████████████| 2070/2070 [01:22<00:00, 25.10it/s] No Threshold Optimization Loss: 0.020467388961734456 No Threshold Optimization F1: 0.8447985874066031 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 5176/5176 [03:25<00:00, 25.17it/s] Testing F1 0.85353871: 100%|██████████████████████████████████████████████████| 2070/2070 [01:22<00:00, 25.12it/s] With Threshold Optimization Loss: 0.020467388961734456 With Threshold Optimization F1: 0.8535387148099821 Test F1: 0.85353871 | Test Loss: 0.02046739 Execution of "Epoch 13" took 1369.260699748993 seconds. Epoch 14 Training F1 0.47105507: 100%|██████████████████████████████████████████████████| 5176/5176 [08:56<00:00, 9.66it/s] Validation F1 0.84037598: 100%|██████████████████████████████████████████████████| 3106/3106 [02:06<00:00, 24.53it/s] No Threshold Optimization Loss: 0.02157102254293918 No Threshold Optimization F1: 0.8403759813038134 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 5176/5176 [03:25<00:00, 25.15it/s] Validation F1 0.84700126: 100%|██████████████████████████████████████████████████| 3106/3106 [02:06<00:00, 24.53it/s] With Threshold Optimization Loss: 0.021571022547436872 With Threshold Optimization F1: 0.8470012585309269 Train F1: 0.47105507 | Train Loss: 0.35116119 Val. F1: 0.84700126 | Val. Loss: 0.02157102 Execution of "Epoch 14" took 995.055655002594 seconds. Epoch 15 Training F1 0.47929655: 100%|██████████████████████████████████████████████████| 5176/5176 [08:58<00:00, 9.62it/s] Validation F1 0.83109269: 100%|██████████████████████████████████████████████████| 3106/3106 [02:06<00:00, 24.47it/s] No Threshold Optimization Loss: 0.02256443200661041 No Threshold Optimization F1: 0.8310926873848581 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 5176/5176 [03:26<00:00, 25.08it/s] Validation F1 0.83709011: 100%|██████████████████████████████████████████████████| 3106/3106 [02:06<00:00, 24.49it/s] With Threshold Optimization Loss: 0.022564432008690592 With Threshold Optimization F1: 0.8370901100908412 Train F1: 0.47929655 | Train Loss: 0.35131642 Val. F1: 0.83709011 | Val. Loss: 0.02256443 Execution of "Epoch 15" took 998.3130967617035 seconds. Epoch 16 Training F1 0.47504848: 100%|██████████████████████████████████████████████████| 5176/5176 [08:58<00:00, 9.61it/s] Validation F1 0.83434199: 100%|██████████████████████████████████████████████████| 3106/3106 [02:06<00:00, 24.53it/s] No Threshold Optimization Loss: 0.02047880669404081 No Threshold Optimization F1: 0.8343419922900752 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 5176/5176 [03:25<00:00, 25.14it/s] Validation F1 0.84015714: 100%|██████████████████████████████████████████████████| 3106/3106 [02:06<00:00, 24.55it/s] With Threshold Optimization Loss: 0.020478806704348027 With Threshold Optimization F1: 0.8401571384463783 Train F1: 0.47504848 | Train Loss: 0.35096686 Val. F1: 0.84015714 | Val. Loss: 0.02047881 Execution of "Epoch 16" took 997.7834022045135 seconds. Epoch 17 Training F1 0.47827459: 100%|██████████████████████████████████████████████████| 5176/5176 [08:59<00:00, 9.60it/s] Validation F1 0.82717323: 100%|██████████████████████████████████████████████████| 3106/3106 [02:06<00:00, 24.53it/s] No Threshold Optimization Loss: 0.02245199861383952 No Threshold Optimization F1: 0.8271732291462186 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 5176/5176 [03:25<00:00, 25.16it/s] Validation F1 0.83338104: 100%|██████████████████████████████████████████████████| 3106/3106 [02:06<00:00, 24.56it/s] With Threshold Optimization Loss: 0.022451998613239828 With Threshold Optimization F1: 0.8333810364056309 Train F1: 0.47827459 | Train Loss: 0.34969382 Val. F1: 0.83338104 | Val. Loss: 0.02245200 Execution of "Epoch 17" took 998.02783203125 seconds. Validation is not better than any of the 5 recent epochs, so training is ending early due to apparent convergence. Testing F1 0.84479859: 100%|██████████████████████████████████████████████████| 2070/2070 [01:22<00:00, 25.11it/s] No Threshold Optimization Loss: 0.020467388961734456 No Threshold Optimization F1: 0.8447985874066031 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 5176/5176 [03:25<00:00, 25.16it/s] Testing F1 0.85353871: 100%|██████████████████████████████████████████████████| 2070/2070 [01:22<00:00, 25.11it/s] With Threshold Optimization Loss: 0.020467388961734456 With Threshold Optimization F1: 0.8535387148099821 Test F1: 0.85353871 | Test Loss: 0.02046739 pnguyen@pnguyenmachine:/home/pnguyen/code/reuters_topic_labelling$

paul-tqh-nguyen commented 4 years ago

https://github.com/paul-tqh-nguyen/reuters_topic_labelling/commit/a842a476dc283e0e9cba8fc13526ee756d4c280b

Our F1 metric was not implemented correctly. We got the definitions of recall and precision switched (this leads to the same behavior though; this isn't the egregious part) and we also implemented them incorrectly by summing along the wrong dimension (this is the egregious part). Our F1 numbers we were using to measure everything so far were COMPLETELY off by orders of magnitude. We were off by 2 orders of magnitude. Our F1 scores are actually TERRIBLE.

Also, our F1 threshold optimization was also implemented incorrectly since we were also summing along the incorrect dimension.

Luckily, it did prove to improve performance! However, an F1 increase of about 0.00025 on an F1 score like 0.004 (which is about what we're getting) isn't a great improvement. It did seem like a good idea. It might fall in the noise.

We added a bunch of assert statements to make the dimension checking more explicit.

Here are the new results we got:

pnguyen@pnguyenmachine:/home/pnguyen/code/reuters_topic_labelling$ ./main.py -train-model /home/pnguyen/.local/lib/python3.7/site-packages/torch/nn/modules/rnn.py:51: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.25 and num_layers=1 "num_layers={}".format(dropout, num_layers)) Model hyperparameters are: number_of_epochs: 40 batch_size: 1 max_vocab_size: 25000 vocab_size: 20023 pre_trained_embedding_specification: glove.840B.300d encoding_hidden_size: 512 number_of_encoding_layers: 1 attention_intermediate_size: 32 number_of_attention_heads: 2 output_size: 67 dropout_probability: 0.25 output_directory: ./default_output/ The model has 9511193 trainable parameters. This processes's PID is 23040. Starting training Epoch 0 Training F1 0.00579886: 100%|██████████████████████████████████████████████████| 5176/5176 [05:07<00:00, 16.82it/s] Without optimization. Validation F1 0.01007198: 100%|██████████████████████████████████████████████████| 3106/3106 [01:17<00:00, 40.07it/s] ======================================== Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 5176/5176 [03:00<00:00, 28.65it/s] Validation F1 0.01590086: 100%|██████████████████████████████████████████████████| 3106/3106 [01:54<00:00, 27.21it/s] Train F1: 0.00579886 | Train Loss: 0.21002795 Val. F1: 0.01590086 | Val. Loss: 0.03968113 Without optimization. Testing F1 0.01024587: 100%|██████████████████████████████████████████████████| 2070/2070 [01:18<00:00, 26.41it/s] ======================================== Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 5176/5176 [02:33<00:00, 33.65it/s] Testing F1 0.01635302: 100%|██████████████████████████████████████████████████| 2070/2070 [00:46<00:00, 44.60it/s] Test F1: 0.01635302 | Test Loss: 0.03948376 Execution of "Epoch 0" took 959.1890280246735 seconds. Epoch 1 Training F1 0.00829892: 100%|██████████████████████████████████████████████████| 5176/5176 [04:36<00:00, 18.73it/s] Without optimization. Validation F1 0.01246504: 100%|██████████████████████████████████████████████████| 3106/3106 [01:10<00:00, 43.88it/s] ======================================== Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 5176/5176 [01:55<00:00, 44.63it/s] Validation F1 0.01566059: 100%|██████████████████████████████████████████████████| 3106/3106 [01:15<00:00, 40.93it/s] Train F1: 0.00829892 | Train Loss: 0.19581290 Val. F1: 0.01566059 | Val. Loss: 0.03452893 Without optimization. Testing F1 0.01292811: 100%|██████████████████████████████████████████████████| 2070/2070 [00:49<00:00, 41.57it/s] ======================================== Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 5176/5176 [02:03<00:00, 41.90it/s] Testing F1 0.01636023: 100%|██████████████████████████████████████████████████| 2070/2070 [00:52<00:00, 39.11it/s] Test F1: 0.01636023 | Test Loss: 0.03333553 Execution of "Epoch 1" took 765.6015198230743 seconds. Epoch 2 Training F1 0.00983587: 100%|██████████████████████████████████████████████████| 5176/5176 [05:03<00:00, 17.06it/s] Without optimization. Validation F1 0.01317623: 100%|██████████████████████████████████████████████████| 3106/3106 [01:16<00:00, 40.72it/s] ======================================== Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 5176/5176 [02:14<00:00, 38.41it/s] Validation F1 0.01539630: 100%|██████████████████████████████████████████████████| 3106/3106 [01:20<00:00, 38.56it/s] Train F1: 0.00983587 | Train Loss: 0.19129124 Val. F1: 0.01539630 | Val. Loss: 0.02604304 Without optimization. Testing F1 0.01357704: 100%|██████████████████████████████████████████████████| 2070/2070 [00:53<00:00, 38.52it/s] ======================================== Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 5176/5176 [01:53<00:00, 45.48it/s] Testing F1 0.01572572: 100%|██████████████████████████████████████████████████| 2070/2070 [00:48<00:00, 42.77it/s] Test F1: 0.01572572 | Test Loss: 0.02576301 Execution of "Epoch 2" took 811.3458452224731 seconds. Epoch 3 Training F1 0.01077880: 100%|██████████████████████████████████████████████████| 5176/5176 [04:45<00:00, 18.14it/s] Without optimization. Validation F1 0.01460822: 100%|██████████████████████████████████████████████████| 3106/3106 [01:12<00:00, 43.08it/s] ======================================== Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 5176/5176 [01:51<00:00, 46.25it/s] Validation F1 0.01538669: 100%|██████████████████████████████████████████████████| 3106/3106 [01:10<00:00, 43.87it/s] Train F1: 0.01077880 | Train Loss: 0.18545946 Val. F1: 0.01538669 | Val. Loss: 0.02750609 Execution of "Epoch 3" took 540.1636793613434 seconds. Epoch 4 Training F1 0.01167847: 100%|██████████████████████████████████████████████████| 5176/5176 [04:37<00:00, 18.67it/s] Without optimization. Validation F1 0.01400755: 100%|██████████████████████████████████████████████████| 3106/3106 [01:12<00:00, 43.05it/s] ======================================== Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 5176/5176 [01:57<00:00, 43.92it/s] Validation F1 0.01495420: 100%|██████████████████████████████████████████████████| 3106/3106 [01:12<00:00, 42.74it/s] Train F1: 0.01167847 | Train Loss: 0.18301381 Val. F1: 0.01495420 | Val. Loss: 0.02405656 Without optimization. Testing F1 0.01442786: 100%|██████████████████████████████████████████████████| 2070/2070 [00:46<00:00, 44.96it/s] ======================================== Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 5176/5176 [01:55<00:00, 44.95it/s] Testing F1 0.01543010: 100%|██████████████████████████████████████████████████| 2070/2070 [00:50<00:00, 41.09it/s] Test F1: 0.01543010 | Test Loss: 0.02413331 Execution of "Epoch 4" took 751.6929433345795 seconds. Epoch 5 Training F1 0.01216003: 100%|██████████████████████████████████████████████████| 5176/5176 [04:40<00:00, 18.44it/s] Without optimization. Validation F1 0.01428626: 100%|██████████████████████████████████████████████████| 3106/3106 [01:11<00:00, 43.54it/s] ======================================== Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 5176/5176 [01:55<00:00, 44.84it/s] Validation F1 0.01478602: 100%|██████████████████████████████████████████████████| 3106/3106 [01:16<00:00, 40.67it/s] Train F1: 0.01216003 | Train Loss: 0.18142742 Val. F1: 0.01478602 | Val. Loss: 0.02090164 Without optimization. Testing F1 0.01478117: 100%|██████████████████████████████████████████████████| 2070/2070 [00:47<00:00, 43.38it/s] ======================================== Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 5176/5176 [01:56<00:00, 44.27it/s] Testing F1 0.01526426: 100%|██████████████████████████████████████████████████| 2070/2070 [00:47<00:00, 43.26it/s] Test F1: 0.01526426 | Test Loss: 0.02061869 Execution of "Epoch 5" took 756.4262020587921 seconds. Validation is not better than any of the 5 recent epochs, so training is ending early due to apparent convergence. Without optimization. Testing F1 0.01478117: 100%|██████████████████████████████████████████████████| 2070/2070 [00:48<00:00, 42.92it/s] ======================================== Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 5176/5176 [01:57<00:00, 44.15it/s] Testing F1 0.01526426: 100%|██████████████████████████████████████████████████| 2070/2070 [00:51<00:00, 40.48it/s] Test F1: 0.01526426 | Test Loss: 0.02061869 pnguyen@pnguyenmachine:/home/pnguyen/code/reuters_topic_labelling$

paul-tqh-nguyen commented 4 years ago

We just reinvestigated this with the convolutional network.

These were our results:

pnguyen@pnguyenmachine:/home/pnguyen/code/reuters_topic_labelling$ pnguyen@pnguyenmachine:/home/pnguyen/code/reuters_topic_labelling$ ./main.py -train-model Dataset balancing took 5.0067901611328125e-06 seconds. Model hyperparameters are: number_of_epochs: 40 batch_size: 64 max_vocab_size: 25000 vocab_size: 17735 pre_trained_embedding_specification: glove.840B.300d output_size: 10 output_directory: ./default_output/ convolution_hidden_size: 512 dropout_probability: 0.5 kernel_sizes: [3, 4, 5, 6] pooling_method: .fn at 0x7f16e5d027a0> The model has 8107838 trainable parameters. This processes's PID is 23407. Starting training Epoch 0 Training F1 0.19358129: 100%|██████████████████████████████████████████████████| 68/68 [00:11<00:00, 5.74it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 15.38it/s] Validation F1 0.45258025: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 17.69it/s] Train F1: 0.19358129 | Train Recall: 0.37202920 | Train Precision: 0.16939408 | Train Loss: 0.82051271 Val. F1: 0.45258025 | Val. Recall: 0.84790637 | Val. Precision: 0.34904499 | Val. Loss: 0.70759940 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 15.27it/s] Testing F1 0.42185843: 100%|██████████████████████████████████████████████████| 27/27 [00:02<00:00, 11.02it/s] Test F1: 0.42185843 | Test Loss: 0.72185349 Epoch 0 took 25.60373592376709 seconds. Epoch 1 Training F1 0.36518760: 100%|██████████████████████████████████████████████████| 68/68 [00:11<00:00, 5.82it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.42it/s] Validation F1 0.54843396: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 18.27it/s] Train F1: 0.36518760 | Train Recall: 0.43587178 | Train Precision: 0.37774585 | Train Loss: 0.76440436 Val. F1: 0.54843396 | Val. Recall: 0.86614927 | Val. Precision: 0.44128377 | Val. Loss: 0.54912160 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.03it/s] Testing F1 0.51569463: 100%|██████████████████████████████████████████████████| 27/27 [00:02<00:00, 11.80it/s] Test F1: 0.51569463 | Test Loss: 0.57936143 Epoch 1 took 24.899386644363403 seconds. Epoch 2 Training F1 0.38917979: 100%|██████████████████████████████████████████████████| 68/68 [00:11<00:00, 5.90it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.32it/s] Validation F1 0.60196084: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 18.42it/s] Train F1: 0.38917979 | Train Recall: 0.42768365 | Train Precision: 0.42181826 | Train Loss: 0.75280945 Val. F1: 0.60196084 | Val. Recall: 0.86050818 | Val. Precision: 0.50439148 | Val. Loss: 0.48874051 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.21it/s] Testing F1 0.57390919: 100%|██████████████████████████████████████████████████| 27/27 [00:02<00:00, 11.93it/s] Test F1: 0.57390919 | Test Loss: 0.52164091 Epoch 2 took 24.61579465866089 seconds. Epoch 3 Training F1 0.42049314: 100%|██████████████████████████████████████████████████| 68/68 [00:11<00:00, 5.93it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.13it/s] Validation F1 0.64759424: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 18.53it/s] Train F1: 0.42049314 | Train Recall: 0.43681764 | Train Precision: 0.47213579 | Train Loss: 0.74411367 Val. F1: 0.64759424 | Val. Recall: 0.86375700 | Val. Precision: 0.55407123 | Val. Loss: 0.44258315 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.36it/s] Testing F1 0.60375983: 100%|██████████████████████████████████████████████████| 27/27 [00:02<00:00, 11.85it/s] Test F1: 0.60375983 | Test Loss: 0.48205135 Epoch 3 took 24.68442153930664 seconds. Epoch 4 Training F1 0.43987237: 100%|██████████████████████████████████████████████████| 68/68 [00:11<00:00, 5.96it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.56it/s] Validation F1 0.69090033: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 18.62it/s] Train F1: 0.43987237 | Train Recall: 0.44052695 | Train Precision: 0.51802756 | Train Loss: 0.73940592 Val. F1: 0.69090033 | Val. Recall: 0.87407245 | Val. Precision: 0.60763095 | Val. Loss: 0.41753697 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.34it/s] Testing F1 0.64465228: 100%|██████████████████████████████████████████████████| 27/27 [00:02<00:00, 12.05it/s] Test F1: 0.64465228 | Test Loss: 0.45634574 Epoch 4 took 24.420939207077026 seconds. Epoch 5 Training F1 0.44824967: 100%|██████████████████████████████████████████████████| 68/68 [00:11<00:00, 5.98it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.50it/s] Validation F1 0.70960968: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 17.96it/s] Train F1: 0.44824967 | Train Recall: 0.41767969 | Train Precision: 0.55973913 | Train Loss: 0.73069276 Val. F1: 0.70960968 | Val. Recall: 0.87322032 | Val. Precision: 0.63389227 | Val. Loss: 0.38473952 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.60it/s] Testing F1 0.66145869: 100%|██████████████████████████████████████████████████| 27/27 [00:02<00:00, 11.89it/s] Test F1: 0.66145869 | Test Loss: 0.42696124 Epoch 5 took 24.236048460006714 seconds. Epoch 6 Training F1 0.47785480: 100%|██████████████████████████████████████████████████| 68/68 [00:11<00:00, 5.95it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.58it/s] Validation F1 0.72383035: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 18.41it/s] Train F1: 0.47785480 | Train Recall: 0.45255837 | Train Precision: 0.58882217 | Train Loss: 0.72580118 Val. F1: 0.72383035 | Val. Recall: 0.87079586 | Val. Precision: 0.65724986 | Val. Loss: 0.36934953 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.47it/s] Testing F1 0.67234365: 100%|██████████████████████████████████████████████████| 27/27 [00:02<00:00, 11.98it/s] Test F1: 0.67234365 | Test Loss: 0.40996795 Epoch 6 took 24.244396209716797 seconds. Epoch 7 Training F1 0.47310568: 100%|██████████████████████████████████████████████████| 68/68 [00:11<00:00, 5.89it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.20it/s] Validation F1 0.71875633: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 18.23it/s] Train F1: 0.47310568 | Train Recall: 0.43490497 | Train Precision: 0.59293329 | Train Loss: 0.72865420 Val. F1: 0.71875633 | Val. Recall: 0.86792532 | Val. Precision: 0.65443802 | Val. Loss: 0.36174059 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.37it/s] Testing F1 0.67502820: 100%|██████████████████████████████████████████████████| 27/27 [00:02<00:00, 12.05it/s] Test F1: 0.67502820 | Test Loss: 0.40328297 Epoch 7 took 24.550684690475464 seconds. Epoch 8 Training F1 0.46952207: 100%|██████████████████████████████████████████████████| 68/68 [00:11<00:00, 5.96it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.38it/s] Validation F1 0.72519202: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 18.43it/s] Train F1: 0.46952207 | Train Recall: 0.43343918 | Train Precision: 0.58963350 | Train Loss: 0.72477555 Val. F1: 0.72519202 | Val. Recall: 0.86250744 | Val. Precision: 0.66342441 | Val. Loss: 0.34367898 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.39it/s] Testing F1 0.68294495: 100%|██████████████████████████████████████████████████| 27/27 [00:02<00:00, 11.92it/s] Test F1: 0.68294495 | Test Loss: 0.38567453 Epoch 8 took 24.55256223678589 seconds. Epoch 9 Training F1 0.47838603: 100%|██████████████████████████████████████████████████| 68/68 [00:11<00:00, 5.93it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.57it/s] Validation F1 0.73875892: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 18.73it/s] Train F1: 0.47838603 | Train Recall: 0.42725216 | Train Precision: 0.64101798 | Train Loss: 0.71788401 Val. F1: 0.73875892 | Val. Recall: 0.88228864 | Val. Precision: 0.67270692 | Val. Loss: 0.34053129 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.26it/s] Testing F1 0.69531441: 100%|██████████████████████████████████████████████████| 27/27 [00:02<00:00, 11.80it/s] Test F1: 0.69531441 | Test Loss: 0.38231156 Epoch 9 took 24.377769231796265 seconds. Epoch 10 Training F1 0.50704228: 100%|██████████████████████████████████████████████████| 68/68 [00:11<00:00, 5.95it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.40it/s] Validation F1 0.74525967: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 18.60it/s] Train F1: 0.50704228 | Train Recall: 0.44969162 | Train Precision: 0.66831205 | Train Loss: 0.71049201 Val. F1: 0.74525967 | Val. Recall: 0.88089017 | Val. Precision: 0.68278576 | Val. Loss: 0.31508718 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.44it/s] Testing F1 0.69705159: 100%|██████████████████████████████████████████████████| 27/27 [00:02<00:00, 12.04it/s] Test F1: 0.69705159 | Test Loss: 0.35979018 Epoch 10 took 24.38924217224121 seconds. Epoch 11 Training F1 0.51158790: 100%|██████████████████████████████████████████████████| 68/68 [00:11<00:00, 5.92it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.40it/s] Validation F1 0.74575646: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 18.55it/s] Train F1: 0.51158790 | Train Recall: 0.45040842 | Train Precision: 0.67640430 | Train Loss: 0.70947386 Val. F1: 0.74575646 | Val. Recall: 0.87515620 | Val. Precision: 0.68647936 | Val. Loss: 0.30318486 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.45it/s] Testing F1 0.70452157: 100%|██████████████████████████████████████████████████| 27/27 [00:02<00:00, 12.05it/s] Test F1: 0.70452157 | Test Loss: 0.34665267 Epoch 11 took 24.547828197479248 seconds. Epoch 12 Training F1 0.50948578: 100%|██████████████████████████████████████████████████| 68/68 [00:11<00:00, 5.97it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.48it/s] Validation F1 0.74947501: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 18.65it/s] Train F1: 0.50948578 | Train Recall: 0.44810722 | Train Precision: 0.68623568 | Train Loss: 0.70813003 Val. F1: 0.74947501 | Val. Recall: 0.87789864 | Val. Precision: 0.68856975 | Val. Loss: 0.30650204 Epoch 12 took 17.72040295600891 seconds. Epoch 13 Training F1 0.51130653: 100%|██████████████████████████████████████████████████| 68/68 [00:11<00:00, 5.96it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.49it/s] Validation F1 0.75263919: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 18.47it/s] Train F1: 0.51130653 | Train Recall: 0.44644682 | Train Precision: 0.68453572 | Train Loss: 0.70647693 Val. F1: 0.75263919 | Val. Recall: 0.87402277 | Val. Precision: 0.69593116 | Val. Loss: 0.29188306 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.12it/s] Testing F1 0.70723504: 100%|██████████████████████████████████████████████████| 27/27 [00:02<00:00, 11.84it/s] Test F1: 0.70723504 | Test Loss: 0.33597826 Epoch 13 took 24.35696291923523 seconds. Epoch 14 Training F1 0.50168408: 100%|██████████████████████████████████████████████████| 68/68 [00:11<00:00, 5.93it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.37it/s] Validation F1 0.75238963: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 18.33it/s] Train F1: 0.50168408 | Train Recall: 0.43671804 | Train Precision: 0.67762424 | Train Loss: 0.70854044 Val. F1: 0.75238963 | Val. Recall: 0.87807169 | Val. Precision: 0.69306102 | Val. Loss: 0.29356908 Epoch 14 took 17.865238189697266 seconds. Epoch 15 Training F1 0.49135497: 100%|██████████████████████████████████████████████████| 68/68 [00:11<00:00, 5.82it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.28it/s] Validation F1 0.75427221: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 18.44it/s] Train F1: 0.49135497 | Train Recall: 0.42005012 | Train Precision: 0.67090537 | Train Loss: 0.70724786 Val. F1: 0.75427221 | Val. Recall: 0.87371404 | Val. Precision: 0.69963736 | Val. Loss: 0.30059145 Epoch 15 took 18.092691898345947 seconds. Epoch 16 Training F1 0.50059473: 100%|██████████████████████████████████████████████████| 68/68 [00:12<00:00, 5.65it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 15.64it/s] Validation F1 0.76138899: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 17.26it/s] Train F1: 0.50059473 | Train Recall: 0.43454575 | Train Precision: 0.67861574 | Train Loss: 0.70805168 Val. F1: 0.76138899 | Val. Recall: 0.86854674 | Val. Precision: 0.71150406 | Val. Loss: 0.28279160 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 15.29it/s] Testing F1 0.71255752: 100%|██████████████████████████████████████████████████| 27/27 [00:02<00:00, 11.42it/s] Test F1: 0.71255752 | Test Loss: 0.32836312 Epoch 16 took 26.098668098449707 seconds. Epoch 17 Training F1 0.50055500: 100%|██████████████████████████████████████████████████| 68/68 [00:11<00:00, 5.78it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 15.43it/s] Validation F1 0.75602082: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 17.18it/s] Train F1: 0.50055500 | Train Recall: 0.43122693 | Train Precision: 0.68407087 | Train Loss: 0.70804055 Val. F1: 0.75602082 | Val. Recall: 0.87480746 | Val. Precision: 0.70132246 | Val. Loss: 0.29383873 Epoch 17 took 18.560661554336548 seconds. Epoch 18 Training F1 0.51116405: 100%|██████████████████████████████████████████████████| 68/68 [00:11<00:00, 5.70it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 15.86it/s] Validation F1 0.74754966: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 17.99it/s] Train F1: 0.51116405 | Train Recall: 0.44763597 | Train Precision: 0.69010723 | Train Loss: 0.70678356 Val. F1: 0.74754966 | Val. Recall: 0.86939159 | Val. Precision: 0.69382438 | Val. Loss: 0.29164781 Epoch 18 took 18.50076961517334 seconds. Validation is not better than any of the 5 recent epochs, so training is ending early due to apparent convergence. Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 15.82it/s] Testing F1 0.71255752: 100%|██████████████████████████████████████████████████| 27/27 [00:02<00:00, 11.56it/s] Test F1: 0.71255752 | Test Loss: 0.32836312 pnguyen@pnguyenmachine:/home/pnguyen/code/reuters_topic_labelling$

It performs ever so slightly worse.

paul-tqh-nguyen commented 4 years ago

BCE + Soft F1 doesn't seem to get a noticeable improvement either.

Let's just hold off on this idea for now

pnguyen@pnguyenmachine:/home/pnguyen/code/reuters_topic_labelling$ ./main.py -train-model Dataset balancing took 5.245208740234375e-06 seconds. Model hyperparameters are: number_of_epochs: 40 batch_size: 64 max_vocab_size: 25000 vocab_size: 17735 pre_trained_embedding_specification: glove.840B.300d output_size: 10 output_directory: ./default_output/ convolution_hidden_size: 512 dropout_probability: 0.5 kernel_sizes: [3, 4, 5, 6] pooling_method: .fn at 0x7fa3ba57c7a0> The model has 8107838 trainable parameters. This processes's PID is 23872. Starting training Epoch 0 Training F1 0.09514184: 100%|██████████████████████████████████████████████████| 68/68 [00:11<00:00, 5.80it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.05it/s] Validation F1 0.42751921: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 17.97it/s] Train F1: 0.09514184 | Train Recall: 0.07434826 | Train Precision: 0.15471015 | Train Loss: 1.29601862 Val. F1: 0.42751921 | Val. Recall: 0.83928369 | Val. Precision: 0.34422272 | Val. Loss: 0.96142113 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 15.86it/s] Testing F1 0.41164538: 100%|██████████████████████████████████████████████████| 27/27 [00:02<00:00, 11.33it/s] Test F1: 0.41164538 | Test Loss: 0.97386873 Epoch 0 took 24.94883894920349 seconds. Epoch 1 Training F1 0.33631436: 100%|██████████████████████████████████████████████████| 68/68 [00:11<00:00, 5.84it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.12it/s] Validation F1 0.66199142: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 17.92it/s] Train F1: 0.33631436 | Train Recall: 0.28457526 | Train Precision: 0.49558661 | Train Loss: 1.16505537 Val. F1: 0.66199142 | Val. Recall: 0.85977885 | Val. Precision: 0.58628221 | Val. Loss: 0.69946314 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.00it/s] Testing F1 0.62623410: 100%|██████████████████████████████████████████████████| 27/27 [00:02<00:00, 11.76it/s] Test F1: 0.62623410 | Test Loss: 0.72795364 Epoch 1 took 24.993367910385132 seconds. Epoch 2 Training F1 0.45410738: 100%|██████████████████████████████████████████████████| 68/68 [00:11<00:00, 5.86it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 15.95it/s] Validation F1 0.72661548: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 17.65it/s] Train F1: 0.45410738 | Train Recall: 0.39699947 | Train Precision: 0.62392879 | Train Loss: 1.11563364 Val. F1: 0.72661548 | Val. Recall: 0.85537061 | Val. Precision: 0.67119164 | Val. Loss: 0.58173236 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.00it/s] Testing F1 0.68969235: 100%|██████████████████████████████████████████████████| 27/27 [00:02<00:00, 11.67it/s] Test F1: 0.68969235 | Test Loss: 0.61047406 Epoch 2 took 25.07904291152954 seconds. Epoch 3 Training F1 0.46999198: 100%|██████████████████████████████████████████████████| 68/68 [00:11<00:00, 5.84it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.14it/s] Validation F1 0.74586043: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 18.24it/s] Train F1: 0.46999198 | Train Recall: 0.40177450 | Train Precision: 0.65855762 | Train Loss: 1.09752555 Val. F1: 0.74586043 | Val. Recall: 0.86376466 | Val. Precision: 0.69204163 | Val. Loss: 0.53712429 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.28it/s] Testing F1 0.71166459: 100%|██████████████████████████████████████████████████| 27/27 [00:02<00:00, 11.92it/s] Test F1: 0.71166459 | Test Loss: 0.56725707 Epoch 3 took 24.851569652557373 seconds. Epoch 4 Training F1 0.49925504: 100%|██████████████████████████████████████████████████| 68/68 [00:11<00:00, 5.86it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.21it/s] Validation F1 0.76947888: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 18.16it/s] Train F1: 0.49925504 | Train Recall: 0.42825055 | Train Precision: 0.67765477 | Train Loss: 1.08299534 Val. F1: 0.76947888 | Val. Recall: 0.86889288 | Val. Precision: 0.72132707 | Val. Loss: 0.49750442 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.21it/s] Testing F1 0.72501056: 100%|██████████████████████████████████████████████████| 27/27 [00:02<00:00, 11.58it/s] Test F1: 0.72501056 | Test Loss: 0.52869261 Epoch 4 took 24.87955617904663 seconds. Epoch 5 Training F1 0.49178856: 100%|██████████████████████████████████████████████████| 68/68 [00:11<00:00, 5.83it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.00it/s] Validation F1 0.77796307: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 18.16it/s] Train F1: 0.49178856 | Train Recall: 0.41006814 | Train Precision: 0.69540514 | Train Loss: 1.07366569 Val. F1: 0.77796307 | Val. Recall: 0.87009186 | Val. Precision: 0.73423386 | Val. Loss: 0.47356941 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.33it/s] Testing F1 0.73590340: 100%|██████████████████████████████████████████████████| 27/27 [00:02<00:00, 11.64it/s] Test F1: 0.73590340 | Test Loss: 0.51165836 Epoch 5 took 24.946784257888794 seconds. Epoch 6 Training F1 0.51937468: 100%|██████████████████████████████████████████████████| 68/68 [00:12<00:00, 5.62it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.04it/s] Validation F1 0.80286909: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 17.05it/s] Train F1: 0.51937468 | Train Recall: 0.43128880 | Train Precision: 0.71883574 | Train Loss: 1.06602148 Val. F1: 0.80286909 | Val. Recall: 0.87100697 | Val. Precision: 0.76800759 | Val. Loss: 0.43759098 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 15.68it/s] Testing F1 0.74438032: 100%|██████████████████████████████████████████████████| 27/27 [00:02<00:00, 11.45it/s] Test F1: 0.74438032 | Test Loss: 0.47608591 Epoch 6 took 25.660332441329956 seconds. Epoch 7 Training F1 0.52232487: 100%|██████████████████████████████████████████████████| 68/68 [00:11<00:00, 5.81it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.19it/s] Validation F1 0.80815757: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 18.31it/s] Train F1: 0.52232487 | Train Recall: 0.43283787 | Train Precision: 0.73306385 | Train Loss: 1.06787483 Val. F1: 0.80815757 | Val. Recall: 0.86518911 | Val. Precision: 0.77830967 | Val. Loss: 0.41859689 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.48it/s] Testing F1 0.75389052: 100%|██████████████████████████████████████████████████| 27/27 [00:02<00:00, 11.93it/s] Test F1: 0.75389052 | Test Loss: 0.46192437 Epoch 7 took 24.88262939453125 seconds. Epoch 8 Training F1 0.53646404: 100%|██████████████████████████████████████████████████| 68/68 [00:11<00:00, 5.79it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 15.89it/s] Validation F1 0.81075099: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 18.32it/s] Train F1: 0.53646404 | Train Recall: 0.44588803 | Train Precision: 0.75190828 | Train Loss: 1.05530193 Val. F1: 0.81075099 | Val. Recall: 0.86825265 | Val. Precision: 0.78082377 | Val. Loss: 0.41237881 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.25it/s] Testing F1 0.74572078: 100%|██████████████████████████████████████████████████| 27/27 [00:02<00:00, 11.95it/s] Test F1: 0.74572078 | Test Loss: 0.45604817 Epoch 8 took 24.936739206314087 seconds. Epoch 9 Training F1 0.53219081: 100%|██████████████████████████████████████████████████| 68/68 [00:12<00:00, 5.54it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.12it/s] Validation F1 0.80663191: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 18.33it/s] Train F1: 0.53219081 | Train Recall: 0.43705412 | Train Precision: 0.74853654 | Train Loss: 1.05900476 Val. F1: 0.80663191 | Val. Recall: 0.87926896 | Val. Precision: 0.76737667 | Val. Loss: 0.41346022 Epoch 9 took 18.7384934425354 seconds. Epoch 10 Training F1 0.54988586: 100%|██████████████████████████████████████████████████| 68/68 [00:11<00:00, 5.69it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.19it/s] Validation F1 0.80922333: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 18.05it/s] Train F1: 0.54988586 | Train Recall: 0.45971839 | Train Precision: 0.76314122 | Train Loss: 1.05375673 Val. F1: 0.80922333 | Val. Recall: 0.85686303 | Val. Precision: 0.78780138 | Val. Loss: 0.36967847 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.29it/s] Testing F1 0.75454338: 100%|██████████████████████████████████████████████████| 27/27 [00:02<00:00, 11.85it/s] Test F1: 0.75454338 | Test Loss: 0.41688429 Epoch 10 took 25.242162704467773 seconds. Epoch 11 Training F1 0.53966855: 100%|██████████████████████████████████████████████████| 68/68 [00:11<00:00, 5.87it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 15.92it/s] Validation F1 0.81448174: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 18.07it/s] Train F1: 0.53966855 | Train Recall: 0.44392827 | Train Precision: 0.76126227 | Train Loss: 1.05619566 Val. F1: 0.81448174 | Val. Recall: 0.86779230 | Val. Precision: 0.78727108 | Val. Loss: 0.37481441 Epoch 11 took 18.118810653686523 seconds. Epoch 12 Training F1 0.55160351: 100%|██████████████████████████████████████████████████| 68/68 [00:11<00:00, 5.83it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.13it/s] Validation F1 0.80738864: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 18.06it/s] Train F1: 0.55160351 | Train Recall: 0.45922240 | Train Precision: 0.75786707 | Train Loss: 1.05366312 Val. F1: 0.80738864 | Val. Recall: 0.85523058 | Val. Precision: 0.78516803 | Val. Loss: 0.35481035 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 15.65it/s] Testing F1 0.74945073: 100%|██████████████████████████████████████████████████| 27/27 [00:02<00:00, 11.78it/s] Test F1: 0.74945073 | Test Loss: 0.40589921 Epoch 12 took 25.07613492012024 seconds. Epoch 13 Training F1 0.52740378: 100%|██████████████████████████████████████████████████| 68/68 [00:11<00:00, 5.84it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.12it/s] Validation F1 0.81210394: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 18.13it/s] Train F1: 0.52740378 | Train Recall: 0.43424994 | Train Precision: 0.74218807 | Train Loss: 1.05872224 Val. F1: 0.81210394 | Val. Recall: 0.86256332 | Val. Precision: 0.78612673 | Val. Loss: 0.34177046 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 15.77it/s] Testing F1 0.75465933: 100%|██████████████████████████████████████████████████| 27/27 [00:02<00:00, 11.85it/s] Test F1: 0.75465933 | Test Loss: 0.39626610 Epoch 13 took 25.067050457000732 seconds. Epoch 14 Training F1 0.53486789: 100%|██████████████████████████████████████████████████| 68/68 [00:11<00:00, 5.80it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 15.25it/s] Validation F1 0.81588099: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 18.14it/s] Train F1: 0.53486789 | Train Recall: 0.43884663 | Train Precision: 0.75638636 | Train Loss: 1.05327674 Val. F1: 0.81588099 | Val. Recall: 0.86312460 | Val. Precision: 0.79345726 | Val. Loss: 0.33999799 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.16it/s] Testing F1 0.75094789: 100%|██████████████████████████████████████████████████| 27/27 [00:02<00:00, 11.80it/s] Test F1: 0.75094789 | Test Loss: 0.39357293 Epoch 14 took 25.239217281341553 seconds. Epoch 15 Training F1 0.54078777: 100%|██████████████████████████████████████████████████| 68/68 [00:11<00:00, 5.83it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.16it/s] Validation F1 0.81520963: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 17.58it/s] Train F1: 0.54078777 | Train Recall: 0.44494954 | Train Precision: 0.75846258 | Train Loss: 1.04656593 Val. F1: 0.81520963 | Val. Recall: 0.86386266 | Val. Precision: 0.78956350 | Val. Loss: 0.33513657 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 15.41it/s] Testing F1 0.75480879: 100%|██████████████████████████████████████████████████| 27/27 [00:02<00:00, 11.19it/s] Test F1: 0.75480879 | Test Loss: 0.39220302 Epoch 15 took 25.365430116653442 seconds. Epoch 16 Training F1 0.52704309: 100%|██████████████████████████████████████████████████| 68/68 [00:11<00:00, 5.79it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 15.70it/s] Validation F1 0.81088067: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 16.72it/s] Train F1: 0.52704309 | Train Recall: 0.43143142 | Train Precision: 0.74924133 | Train Loss: 1.05822652 Val. F1: 0.81088067 | Val. Recall: 0.85998912 | Val. Precision: 0.78793311 | Val. Loss: 0.33239675 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 15.24it/s] Testing F1 0.74993485: 100%|██████████████████████████████████████████████████| 27/27 [00:02<00:00, 11.25it/s] Test F1: 0.74993485 | Test Loss: 0.38918665 Epoch 16 took 25.732885122299194 seconds. Epoch 17 Training F1 0.55096132: 100%|██████████████████████████████████████████████████| 68/68 [00:11<00:00, 5.84it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.24it/s] Validation F1 0.80850266: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 18.63it/s] Train F1: 0.55096132 | Train Recall: 0.45351604 | Train Precision: 0.77913970 | Train Loss: 1.04970779 Val. F1: 0.80850266 | Val. Recall: 0.85812328 | Val. Precision: 0.78598288 | Val. Loss: 0.34047906 Epoch 17 took 18.034229516983032 seconds. Epoch 18 Training F1 0.54393305: 100%|██████████████████████████████████████████████████| 68/68 [00:11<00:00, 5.88it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.59it/s] Validation F1 0.81065773: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 17.06it/s] Train F1: 0.54393305 | Train Recall: 0.44789789 | Train Precision: 0.76528851 | Train Loss: 1.05122887 Val. F1: 0.81065773 | Val. Recall: 0.85808881 | Val. Precision: 0.78942025 | Val. Loss: 0.32343822 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 15.57it/s] Testing F1 0.75314168: 100%|██████████████████████████████████████████████████| 27/27 [00:02<00:00, 11.01it/s] Test F1: 0.75314168 | Test Loss: 0.37896497 Epoch 18 took 25.172677993774414 seconds. Epoch 19 Training F1 0.54865274: 100%|██████████████████████████████████████████████████| 68/68 [00:12<00:00, 5.35it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 15.42it/s] Validation F1 0.81021266: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 17.48it/s] Train F1: 0.54865274 | Train Recall: 0.45046322 | Train Precision: 0.77571080 | Train Loss: 1.05262422 Val. F1: 0.81021266 | Val. Recall: 0.85693706 | Val. Precision: 0.78909157 | Val. Loss: 0.32195583 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 15.38it/s] Testing F1 0.75383309: 100%|██████████████████████████████████████████████████| 27/27 [00:02<00:00, 11.28it/s] Test F1: 0.75383309 | Test Loss: 0.37541709 Epoch 19 took 26.486910343170166 seconds. Epoch 20 Training F1 0.53978331: 100%|██████████████████████████████████████████████████| 68/68 [00:12<00:00, 5.52it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 15.34it/s] Validation F1 0.80975635: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 17.22it/s] Train F1: 0.53978331 | Train Recall: 0.44015459 | Train Precision: 0.76789217 | Train Loss: 1.05734023 Val. F1: 0.80975635 | Val. Recall: 0.85863759 | Val. Precision: 0.78869719 | Val. Loss: 0.32325385 Epoch 20 took 19.13105297088623 seconds. Epoch 21 Training F1 0.54336243: 100%|██████████████████████████████████████████████████| 68/68 [00:11<00:00, 5.69it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 15.87it/s] Validation F1 0.81095093: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 18.00it/s] Train F1: 0.54336243 | Train Recall: 0.44750820 | Train Precision: 0.76220590 | Train Loss: 1.05316993 Val. F1: 0.81095093 | Val. Recall: 0.86582243 | Val. Precision: 0.78321101 | Val. Loss: 0.32064020 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.60it/s] Testing F1 0.74645713: 100%|██████████████████████████████████████████████████| 27/27 [00:02<00:00, 11.76it/s] Test F1: 0.74645713 | Test Loss: 0.37611477 Epoch 21 took 25.197996616363525 seconds. Epoch 22 Training F1 0.54112819: 100%|██████████████████████████████████████████████████| 68/68 [00:11<00:00, 5.85it/s] Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.64it/s] Validation F1 0.80636207: 100%|██████████████████████████████████████████████████| 41/41 [00:02<00:00, 18.70it/s] Train F1: 0.54112819 | Train Recall: 0.44375577 | Train Precision: 0.76092438 | Train Loss: 1.04915702 Val. F1: 0.80636207 | Val. Recall: 0.85845031 | Val. Precision: 0.78260552 | Val. Loss: 0.31658751 Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.31it/s] Testing F1 0.74768757: 100%|██████████████████████████████████████████████████| 27/27 [00:02<00:00, 11.92it/s] Test F1: 0.74768757 | Test Loss: 0.37209425 Epoch 22 took 24.7023344039917 seconds. Validation is not better than any of the 5 recent epochs, so training is ending early due to apparent convergence. Optimizing F1 Threshold: 100%|██████████████████████████████████████████████████| 68/68 [00:04<00:00, 16.12it/s] Testing F1 0.74768757: 100%|██████████████████████████████████████████████████| 27/27 [00:02<00:00, 11.67it/s] Test F1: 0.74768757 | Test Loss: 0.37209425 pnguyen@pnguyenmachine:/home/pnguyen/code/reuters_topic_labelling$

paul-tqh-nguyen / reuters_topic_labelling

Optimize F1 Threshold. #20