Problems to reproduce the results on DialSeg_711

wangruicn commented 2 years ago

Hi, thanks for sharing the code! I am trying to reproduce the results and have some problems.

Below are my steps to train a model on the DailyDialog corpus and evaluate it on the _DialSeg711 dataset.

Run data_process.py to generate the training data. The paths in data_process.py (text_path, topic_path, and act_path) are set to the corresponding files in the data directory of this repo.
Run model.py to train a coherence scoring model with the training data generated above.
The original version of _DialSeg711 does not fit the format requirement of test.py, so I did some pre-processing. Each dialogue is saved as one file, where each line refers to an utterance, and the topics in one dialogue are separated by a line of ================, according to the code in test.py
Run test.py and get the numbers.

It seems different from the numbers in the paper.

pk:  0.4313210101897487806145379205
wd:  0.4403905226167513422391657018
mae:  6.106891701828411
f1:  0.5422400374765356
dp variance:  0.0

This is the detail of training.

2022-04-19 16:34:28.405199: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/pai/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/lib/x86_64-linux-gnu:/home/pai/jre/lib/amd64/server:/home/pai/jre/lib/amd64/server
2022-04-19 16:34:28.405228: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Loading BERT tokenizer...
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Max sentence length:  233

Padding/truncating all sentences to 256 values...
The group number is: 30831
start generating pos and neg pairs ...
there are 91581 samples been generated...
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForNextSentencePrediction: ['cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias']
- This IS expected if you are initializing BertForNextSentencePrediction from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForNextSentencePrediction from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
/home/admin/workspace/miniconda3/envs/upcs/lib/python3.7/site-packages/transformers/optimization.py:309: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  FutureWarning,

======== Epoch 1 / 10 ========
Training...
1000 steps done....
=========== the loss for epoch 0 is: 0.4508296186830202

Running Validation...
1000 steps done....
2000 steps done....
3000 steps done....
4000 steps done....
5000 steps done....
6000 steps done....
0.8606155736026753

======== Epoch 2 / 10 ========
Training...
1000 steps done....
=========== the loss for epoch 1 is: 0.227924036773932

Running Validation...
1000 steps done....
2000 steps done....
3000 steps done....
4000 steps done....
5000 steps done....
6000 steps done....
0.8697604586091585

======== Epoch 3 / 10 ========
Training...
1000 steps done....
=========== the loss for epoch 2 is: 0.1091887889561323

Running Validation...
1000 steps done....
2000 steps done....
3000 steps done....
4000 steps done....
5000 steps done....
6000 steps done....
0.8709479287517914

======== Epoch 4 / 10 ========
Training...
1000 steps done....
=========== the loss for epoch 3 is: 0.054024717311652676

Running Validation...
1000 steps done....
2000 steps done....
3000 steps done....
4000 steps done....
5000 steps done....
6000 steps done....
0.8706885961919061

======== Epoch 5 / 10 ========
Training...
1000 steps done....
=========== the loss for epoch 4 is: 0.03296536928869265

Running Validation...
1000 steps done....
2000 steps done....
3000 steps done....
4000 steps done....
5000 steps done....
6000 steps done....
0.8723674332901112

======== Epoch 6 / 10 ========
Training...
1000 steps done....
=========== the loss for epoch 5 is: 0.020094164591994288

Running Validation...
1000 steps done....
2000 steps done....
3000 steps done....
4000 steps done....
5000 steps done....
6000 steps done....
0.8712891558042721

======== Epoch 7 / 10 ========
Training...
1000 steps done....
=========== the loss for epoch 6 is: 0.014146699178053905

Running Validation...
1000 steps done....
2000 steps done....
3000 steps done....
4000 steps done....
5000 steps done....
6000 steps done....
0.8713437521326691

======== Epoch 8 / 10 ========
Training...
1000 steps done....
=========== the loss for epoch 7 is: 0.011553556534574357

Running Validation...
1000 steps done....
2000 steps done....
3000 steps done....
4000 steps done....
5000 steps done....
6000 steps done....
0.8730498873950727

======== Epoch 9 / 10 ========
Training...
1000 steps done....
=========== the loss for epoch 8 is: 0.006559821181779955

Running Validation...
1000 steps done....
2000 steps done....
3000 steps done....
4000 steps done....
5000 steps done....
6000 steps done....
0.8745649355080871

======== Epoch 10 / 10 ========
Training...
1000 steps done....
=========== the loss for epoch 9 is: 0.004438388840793665

Running Validation...
1000 steps done....
2000 steps done....
3000 steps done....
4000 steps done....
5000 steps done....
6000 steps done....
0.8751108987920563

lxing532 commented 2 years ago

Hi,

The accuracy you got during the training stage seems a bit problematic. When I did the training, accuracy achieved over 0.95 after several epochs. Did you change any setting in the training code?

nihao517 commented 1 year ago

Hi, thanks for sharing the code! I am trying to reproduce the results and have some problems.

Below are my steps to train a model on the DailyDialog corpus and evaluate it on the _DialSeg711 dataset.

Run data_process.py to generate the training data. The paths in data_process.py (text_path, topic_path, and act_path) are set to the corresponding files in the data directory of this repo.
Run model.py to train a coherence scoring model with the training data generated above.
The original version of _DialSeg711 does not fit the format requirement of test.py, so I did some pre-processing. Each dialogue is saved as one file, where each line refers to an utterance, and the topics in one dialogue are separated by a line of ================, according to the code in test.py
Run test.py and get the numbers.

It seems different from the numbers in the paper.

pk:  0.4313210101897487806145379205
wd:  0.4403905226167513422391657018
mae:  6.106891701828411
f1:  0.5422400374765356
dp variance:  0.0

This is the detail of training.

2022-04-19 16:34:28.405199: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/pai/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/lib/x86_64-linux-gnu:/home/pai/jre/lib/amd64/server:/home/pai/jre/lib/amd64/server
2022-04-19 16:34:28.405228: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Loading BERT tokenizer...
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Max sentence length:  233

Padding/truncating all sentences to 256 values...
The group number is: 30831
start generating pos and neg pairs ...
there are 91581 samples been generated...
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForNextSentencePrediction: ['cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias']
- This IS expected if you are initializing BertForNextSentencePrediction from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForNextSentencePrediction from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
/home/admin/workspace/miniconda3/envs/upcs/lib/python3.7/site-packages/transformers/optimization.py:309: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  FutureWarning,

======== Epoch 1 / 10 ========
Training...
1000 steps done....
=========== the loss for epoch 0 is: 0.4508296186830202

Running Validation...
1000 steps done....
2000 steps done....
3000 steps done....
4000 steps done....
5000 steps done....
6000 steps done....
0.8606155736026753

======== Epoch 2 / 10 ========
Training...
1000 steps done....
=========== the loss for epoch 1 is: 0.227924036773932

Running Validation...
1000 steps done....
2000 steps done....
3000 steps done....
4000 steps done....
5000 steps done....
6000 steps done....
0.8697604586091585

======== Epoch 3 / 10 ========
Training...
1000 steps done....
=========== the loss for epoch 2 is: 0.1091887889561323

Running Validation...
1000 steps done....
2000 steps done....
3000 steps done....
4000 steps done....
5000 steps done....
6000 steps done....
0.8709479287517914

======== Epoch 4 / 10 ========
Training...
1000 steps done....
=========== the loss for epoch 3 is: 0.054024717311652676

Running Validation...
1000 steps done....
2000 steps done....
3000 steps done....
4000 steps done....
5000 steps done....
6000 steps done....
0.8706885961919061

======== Epoch 5 / 10 ========
Training...
1000 steps done....
=========== the loss for epoch 4 is: 0.03296536928869265

Running Validation...
1000 steps done....
2000 steps done....
3000 steps done....
4000 steps done....
5000 steps done....
6000 steps done....
0.8723674332901112

======== Epoch 6 / 10 ========
Training...
1000 steps done....
=========== the loss for epoch 5 is: 0.020094164591994288

Running Validation...
1000 steps done....
2000 steps done....
3000 steps done....
4000 steps done....
5000 steps done....
6000 steps done....
0.8712891558042721

======== Epoch 7 / 10 ========
Training...
1000 steps done....
=========== the loss for epoch 6 is: 0.014146699178053905

Running Validation...
1000 steps done....
2000 steps done....
3000 steps done....
4000 steps done....
5000 steps done....
6000 steps done....
0.8713437521326691

======== Epoch 8 / 10 ========
Training...
1000 steps done....
=========== the loss for epoch 7 is: 0.011553556534574357

Running Validation...
1000 steps done....
2000 steps done....
3000 steps done....
4000 steps done....
5000 steps done....
6000 steps done....
0.8730498873950727

======== Epoch 9 / 10 ========
Training...
1000 steps done....
=========== the loss for epoch 8 is: 0.006559821181779955

Running Validation...
1000 steps done....
2000 steps done....
3000 steps done....
4000 steps done....
5000 steps done....
6000 steps done....
0.8745649355080871

======== Epoch 10 / 10 ========
Training...
1000 steps done....
=========== the loss for epoch 9 is: 0.004438388840793665

Running Validation...
1000 steps done....
2000 steps done....
3000 steps done....
4000 steps done....
5000 steps done....
6000 steps done....
0.8751108987920563

Hello, I'm also trying to reproduce the results. Could you please share the pre-processed code that you mentioned in step 3? :)

lxing532 / Dialogue-Topic-Segmenter

Problems to reproduce the results on DialSeg_711 #4