yzhangcs / parser

:rocket: State-of-the-art parsers for natural language.
https://parser.yzhang.site/
MIT License
827 stars 139 forks source link

Parser does not get best metrics #82

Closed rootofmylife closed 2 years ago

rootofmylife commented 2 years ago

Hello,

I wonder this is a bug or not. When I trained a model with 100 epochs, the last epoch had the highest metric but it did not save, instead of saving another lower metric model. See my log for more information.

Moreover, I remember that in old version, the parser had early stopping, but I did not see it in this version. It is really a cool features though.

Thank you for your help.

P/S: I use the latest code and not change anything.

2021-09-15 13:47:59 INFO Epoch 90 / 100:
2021-09-15 13:51:17 INFO lr: 5.6657e-06 - loss: 0.0040 - UCM: 96.24% LCM: 92.89% UAS: 99.81% LAS: 99.64%
2021-09-15 13:51:18 INFO dev:  loss: 3.4414 - UCM: 26.50% LCM: 11.00% UAS: 85.42% LAS: 77.93%
2021-09-15 13:51:20 INFO test: loss: 3.6520 - UCM: 27.16% LCM: 12.94% UAS: 85.25% LAS: 78.42%
2021-09-15 13:51:20 INFO 0:03:20.924831s elapsed

2021-09-15 13:51:20 INFO Epoch 91 / 100:
2021-09-15 13:54:38 INFO lr: 5.1114e-06 - loss: 0.0024 - UCM: 96.17% LCM: 93.22% UAS: 99.81% LAS: 99.66%
2021-09-15 13:54:39 INFO dev:  loss: 3.5128 - UCM: 26.50% LCM: 11.00% UAS: 85.61% LAS: 78.24%
2021-09-15 13:54:41 INFO test: loss: 3.7178 - UCM: 27.75% LCM: 12.94% UAS: 85.28% LAS: 78.40%
2021-09-15 13:54:41 INFO 0:03:20.681193s elapsed

2021-09-15 13:54:41 INFO Epoch 92 / 100:
2021-09-15 13:57:59 INFO lr: 4.5570e-06 - loss: 0.0021 - UCM: 96.27% LCM: 92.98% UAS: 99.81% LAS: 99.65%
2021-09-15 13:57:59 INFO dev:  loss: 3.4911 - UCM: 27.00% LCM: 11.00% UAS: 85.42% LAS: 78.00%
2021-09-15 13:58:01 INFO test: loss: 3.6980 - UCM: 27.45% LCM: 13.43% UAS: 85.24% LAS: 78.37%
2021-09-15 13:58:01 INFO 0:03:20.802275s elapsed

2021-09-15 13:58:01 INFO Epoch 93 / 100:
2021-09-15 14:01:19 INFO lr: 4.0027e-06 - loss: 0.0045 - UCM: 96.47% LCM: 93.31% UAS: 99.82% LAS: 99.66%
2021-09-15 14:01:20 INFO dev:  loss: 3.4927 - UCM: 26.00% LCM: 11.00% UAS: 85.38% LAS: 77.93%
2021-09-15 14:01:22 INFO test: loss: 3.7007 - UCM: 27.94% LCM: 13.43% UAS: 85.32% LAS: 78.43%
2021-09-15 14:01:22 INFO 0:03:20.380205s elapsed

2021-09-15 14:01:22 INFO Epoch 94 / 100:
2021-09-15 14:04:40 INFO lr: 3.4484e-06 - loss: 0.0009 - UCM: 96.39% LCM: 93.42% UAS: 99.81% LAS: 99.67%
2021-09-15 14:04:41 INFO dev:  loss: 3.5013 - UCM: 26.50% LCM: 11.00% UAS: 85.49% LAS: 78.08%
2021-09-15 14:04:43 INFO test: loss: 3.6941 - UCM: 27.84% LCM: 13.53% UAS: 85.27% LAS: 78.38%
2021-09-15 14:04:43 INFO 0:03:21.236538s elapsed

2021-09-15 14:04:43 INFO Epoch 95 / 100:
2021-09-15 14:08:01 INFO lr: 2.8941e-06 - loss: 0.0002 - UCM: 96.30% LCM: 93.34% UAS: 99.81% LAS: 99.67%
2021-09-15 14:08:02 INFO dev:  loss: 3.5176 - UCM: 26.50% LCM: 11.00% UAS: 85.45% LAS: 78.05%
2021-09-15 14:08:04 INFO test: loss: 3.7169 - UCM: 27.65% LCM: 13.53% UAS: 85.31% LAS: 78.47%
2021-09-15 14:08:04 INFO 0:03:20.700472s elapsed

2021-09-15 14:08:04 INFO Epoch 96 / 100:
2021-09-15 14:11:22 INFO lr: 2.3398e-06 - loss: 0.0027 - UCM: 96.48% LCM: 93.47% UAS: 99.82% LAS: 99.67%
2021-09-15 14:11:22 INFO dev:  loss: 3.5252 - UCM: 26.50% LCM: 12.00% UAS: 85.54% LAS: 78.19%
2021-09-15 14:11:24 INFO test: loss: 3.7445 - UCM: 27.65% LCM: 13.14% UAS: 85.39% LAS: 78.54%
2021-09-15 14:11:24 INFO 0:03:20.688394s elapsed

2021-09-15 14:11:24 INFO Epoch 97 / 100:
2021-09-15 14:14:43 INFO lr: 1.7855e-06 - loss: 0.0029 - UCM: 96.69% LCM: 93.61% UAS: 99.83% LAS: 99.68%
2021-09-15 14:14:43 INFO dev:  loss: 3.5313 - UCM: 26.50% LCM: 11.00% UAS: 85.54% LAS: 78.22%
2021-09-15 14:14:45 INFO test: loss: 3.7412 - UCM: 28.04% LCM: 13.63% UAS: 85.35% LAS: 78.53%
2021-09-15 14:14:45 INFO 0:03:21.046132s elapsed

2021-09-15 14:14:45 INFO Epoch 98 / 100:
2021-09-15 14:18:04 INFO lr: 1.2312e-06 - loss: 0.0042 - UCM: 96.91% LCM: 94.08% UAS: 99.84% LAS: 99.70%
2021-09-15 14:18:05 INFO dev:  loss: 3.5500 - UCM: 26.50% LCM: 11.50% UAS: 85.59% LAS: 78.24%
2021-09-15 14:18:07 INFO test: loss: 3.7603 - UCM: 28.14% LCM: 13.73% UAS: 85.40% LAS: 78.55%
2021-09-15 14:18:07 INFO 0:03:21.250676s elapsed

2021-09-15 14:18:07 INFO Epoch 99 / 100:
2021-09-15 14:21:25 INFO lr: 6.7694e-07 - loss: 0.0031 - UCM: 96.66% LCM: 93.89% UAS: 99.83% LAS: 99.69%
2021-09-15 14:21:26 INFO dev:  loss: 3.5548 - UCM: 26.50% LCM: 11.50% UAS: 85.56% LAS: 78.19%
2021-09-15 14:21:28 INFO test: loss: 3.7634 - UCM: 28.14% LCM: 13.53% UAS: 85.40% LAS: 78.53%
2021-09-15 14:21:28 INFO 0:03:21.061997s elapsed

2021-09-15 14:21:28 INFO Epoch 100 / 100:
2021-09-15 14:24:46 INFO lr: 1.2263e-07 - loss: 0.0029 - UCM: 96.87% LCM: 93.96% UAS: 99.84% LAS: 99.70%
2021-09-15 14:24:46 INFO dev:  loss: 3.5592 - UCM: 26.50% LCM: 11.00% UAS: 85.59% LAS: 78.24%
2021-09-15 14:24:48 INFO test: loss: 3.7705 - UCM: 28.04% LCM: 13.43% UAS: 85.42% LAS: 78.59%
2021-09-15 14:24:48 INFO 0:03:20.730533s elapsed

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Some weights of the model checkpoint at vinai/phobert-base were not used when initializing RobertaModel: ['lm_head.layer_norm.weight', 'lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.bias', 'lm_head.decoder.weight', 'lm_head.decoder.bias', 'lm_head.layer_norm.bias']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
2021-09-15 14:25:03 INFO Epoch 21 saved
2021-09-15 14:25:03 INFO dev:  UCM: 21.50% LCM: 13.00% UAS: 85.47% LAS: 78.31%
2021-09-15 14:25:03 INFO test: UCM: 25.88% LCM: 12.55% UAS: 84.71% LAS: 78.10%
2021-09-15 14:25:03 INFO 5:34:33.226983s elapsed, 0:03:20.732270s/epoch
yzhangcs commented 2 years ago

@rootofmylife Yes, sorry it's indeed a bug. I have pushed the fix to main branch recently, but it has not been released yet.

yzhangcs commented 2 years ago

@rootofmylife Please check this commit: https://github.com/yzhangcs/parser/commit/716ac3ed. The error occurs because I accidentally saved the model of last epoch rather than the best-performing one.

rootofmylife commented 2 years ago

@yzhangcs Yeah, I have noticed that and updated to the latest code too. Please see my attached image.

Screen Shot 2021-09-15 at 23 58 36
yzhangcs commented 2 years ago

@rootofmylife

P/S: I use the latest code and not change anything. Sorry, I neglected this. What is the value of patience? I think there may still be some logical errors for model saving.

rootofmylife commented 2 years ago

@yzhangcs

I use default value of patience, so it is 100. The patience changes its value during training but I did not record it.

Ahhh, so I need to change value of patience to 10 if I want more earlier stopping right?

def train(self, train, dev, test, buckets=32, batch_size=5000, update_steps=1,
              clip=5.0, epochs=5000, patience=100, **kwargs):
...
yzhangcs commented 2 years ago

@rootofmylife Yes, patience controls the number of continued epochs after the best one.

rootofmylife commented 2 years ago

@yzhangcs thanks for your information.

For the best metric problem, I have an updated discovery. I run a new training and see that the best value is probably evaluated on LAS. Please check my log (I also removed some redundant logs).

As you could see, the epoch 34 was saved because it had the highest score on LAS. So I checked the code and it was right (for the old and latest version). I guess your code is ok, nothing's wrong with the logic.

But I have another question, why don't we compare score using UAS or maybe both UAS and LAS? Moreover, we can save two models, one for the best LAS, one for the best UAS, can't we?

Thank you.

2021-09-16 05:19:43 INFO Epoch 25 / 100:
2021-09-16 05:20:36 INFO lr: 4.1872e-05 - loss: 0.0670 - UCM: 51.90% LCM: 26.82% UAS: 95.68% LAS: 90.48%
2021-09-16 05:20:37 INFO dev:  loss: 1.2132 - UCM: 26.50% LCM: 13.00% UAS: 86.15% LAS: 78.73%
2021-09-16 05:20:39 INFO test: loss: 1.2733 - UCM: 26.96% LCM: 12.94% UAS: 84.86% LAS: 78.11%
2021-09-16 05:20:46 INFO 0:00:56.394561s elapsed (saved)

...

2021-09-16 05:22:39 INFO Epoch 28 / 100:
2021-09-16 05:23:32 INFO lr: 4.0230e-05 - loss: 0.0390 - UCM: 57.64% LCM: 30.00% UAS: 96.48% LAS: 91.69%
2021-09-16 05:23:33 INFO dev:  loss: 1.3105 - UCM: 26.50% LCM: 11.50% UAS: 86.10% LAS: 78.73%
2021-09-16 05:23:35 INFO test: loss: 1.4364 - UCM: 27.06% LCM: 12.55% UAS: 85.01% LAS: 78.26%
2021-09-16 05:23:35 INFO 0:00:56.257415s elapsed

021-09-16 05:27:21 INFO Epoch 33 / 100:
2021-09-16 05:28:14 INFO lr: 3.7493e-05 - loss: 0.0374 - UCM: 65.09% LCM: 36.39% UAS: 97.49% LAS: 93.43%
2021-09-16 05:28:15 INFO dev:  loss: 1.5302 - UCM: 27.00% LCM: 13.00% UAS: 86.06% LAS: 78.64%
2021-09-16 05:28:17 INFO test: loss: 1.6383 - UCM: 25.69% LCM: 12.84% UAS: 84.95% LAS: 78.03%
2021-09-16 05:28:17 INFO 0:00:56.611270s elapsed

2021-09-16 05:28:17 INFO Epoch 34 / 100:
2021-09-16 05:29:11 INFO lr: 3.6946e-05 - loss: 0.0414 - UCM: 66.41% LCM: 36.79% UAS: 97.64% LAS: 93.68%
2021-09-16 05:29:12 INFO dev:  loss: 1.5116 - UCM: 28.00% LCM: 12.00% UAS: 86.01% LAS: 78.78%
2021-09-16 05:29:14 INFO test: loss: 1.6564 - UCM: 25.98% LCM: 12.94% UAS: 84.94% LAS: 78.16%
2021-09-16 05:29:20 INFO 0:00:56.750563s elapsed (saved)

2021-09-16 05:33:06 INFO Epoch 39 / 100:
2021-09-16 05:34:00 INFO lr: 3.4209e-05 - loss: 0.0026 - UCM: 72.46% LCM: 43.89% UAS: 98.22% LAS: 95.02%
2021-09-16 05:34:00 INFO dev:  loss: 1.7252 - UCM: 27.00% LCM: 11.00% UAS: 86.01% LAS: 78.43%
2021-09-16 05:34:02 INFO test: loss: 1.8434 - UCM: 26.47% LCM: 12.84% UAS: 85.03% LAS: 78.08%
2021-09-16 05:34:02 INFO 0:00:56.636734s elapsed

2021-09-16 05:52:53 INFO Epoch 60 / 100:
2021-09-16 05:53:46 INFO lr: 2.2715e-05 - loss: 0.0057 - UCM: 86.14% LCM: 68.07% UAS: 99.22% LAS: 97.94%
2021-09-16 05:53:47 INFO dev:  loss: 2.2656 - UCM: 28.50% LCM: 10.00% UAS: 86.13% LAS: 78.40%
2021-09-16 05:53:49 INFO test: loss: 2.4938 - UCM: 26.76% LCM: 13.04% UAS: 85.20% LAS: 78.32%
2021-09-16 05:53:49 INFO 0:00:56.310534s elapsed

...

2021-09-16 06:31:56 INFO Epoch 34 saved
2021-09-16 06:31:56 INFO dev:  UCM: 28.00% LCM: 12.00% UAS: 86.01% LAS: 78.78%
2021-09-16 06:31:56 INFO test: UCM: 25.98% LCM: 12.94% UAS: 84.94% LAS: 78.16%
2021-09-16 06:31:56 INFO 1:34:07.910651s elapsed, 0:00:56.479107s/epoch 
yzhangcs commented 2 years ago

@rootofmylife

But I have another question, why don't we compare score using UAS or maybe both UAS and LAS? Moreover, we can save two models, one for the best LAS, one for the best UAS, can't we?

Yes, but you may need to implement a Metric class yourself. https://github.com/yzhangcs/parser/blob/191880d4a0417280e45c8d7595711b71c5427d61/supar/utils/metric.py#L59-L61 The score property defines the values you wish to compare between different metrics.

rootofmylife commented 2 years ago

@yzhangcs

Yeah, I got it, so I will close the issue here. Thank you for your support.

Have a good day.