Open wangruicn opened 2 years ago
Hi,
The accuracy you got during the training stage seems a bit problematic. When I did the training, accuracy achieved over 0.95 after several epochs. Did you change any setting in the training code?
Hi, thanks for sharing the code! I am trying to reproduce the results and have some problems.
Below are my steps to train a model on the DailyDialog corpus and evaluate it on the _DialSeg711 dataset.
- Run
data_process.py
to generate the training data. The paths indata_process.py
(text_path
,topic_path
, andact_path
) are set to the corresponding files in thedata
directory of this repo.- Run
model.py
to train a coherence scoring model with the training data generated above.- The original version of _DialSeg711 does not fit the format requirement of
test.py
, so I did some pre-processing. Each dialogue is saved as one file, where each line refers to an utterance, and the topics in one dialogue are separated by a line of================
, according to the code intest.py
- Run
test.py
and get the numbers.It seems different from the numbers in the paper.
pk: 0.4313210101897487806145379205 wd: 0.4403905226167513422391657018 mae: 6.106891701828411 f1: 0.5422400374765356 dp variance: 0.0
This is the detail of training.
2022-04-19 16:34:28.405199: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/pai/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/lib/x86_64-linux-gnu:/home/pai/jre/lib/amd64/server:/home/pai/jre/lib/amd64/server 2022-04-19 16:34:28.405228: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. Loading BERT tokenizer... Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`. Max sentence length: 233 Padding/truncating all sentences to 256 values... The group number is: 30831 start generating pos and neg pairs ... there are 91581 samples been generated... Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForNextSentencePrediction: ['cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias'] - This IS expected if you are initializing BertForNextSentencePrediction from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing BertForNextSentencePrediction from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). /home/admin/workspace/miniconda3/envs/upcs/lib/python3.7/site-packages/transformers/optimization.py:309: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning FutureWarning, ======== Epoch 1 / 10 ======== Training... 1000 steps done.... =========== the loss for epoch 0 is: 0.4508296186830202 Running Validation... 1000 steps done.... 2000 steps done.... 3000 steps done.... 4000 steps done.... 5000 steps done.... 6000 steps done.... 0.8606155736026753 ======== Epoch 2 / 10 ======== Training... 1000 steps done.... =========== the loss for epoch 1 is: 0.227924036773932 Running Validation... 1000 steps done.... 2000 steps done.... 3000 steps done.... 4000 steps done.... 5000 steps done.... 6000 steps done.... 0.8697604586091585 ======== Epoch 3 / 10 ======== Training... 1000 steps done.... =========== the loss for epoch 2 is: 0.1091887889561323 Running Validation... 1000 steps done.... 2000 steps done.... 3000 steps done.... 4000 steps done.... 5000 steps done.... 6000 steps done.... 0.8709479287517914 ======== Epoch 4 / 10 ======== Training... 1000 steps done.... =========== the loss for epoch 3 is: 0.054024717311652676 Running Validation... 1000 steps done.... 2000 steps done.... 3000 steps done.... 4000 steps done.... 5000 steps done.... 6000 steps done.... 0.8706885961919061 ======== Epoch 5 / 10 ======== Training... 1000 steps done.... =========== the loss for epoch 4 is: 0.03296536928869265 Running Validation... 1000 steps done.... 2000 steps done.... 3000 steps done.... 4000 steps done.... 5000 steps done.... 6000 steps done.... 0.8723674332901112 ======== Epoch 6 / 10 ======== Training... 1000 steps done.... =========== the loss for epoch 5 is: 0.020094164591994288 Running Validation... 1000 steps done.... 2000 steps done.... 3000 steps done.... 4000 steps done.... 5000 steps done.... 6000 steps done.... 0.8712891558042721 ======== Epoch 7 / 10 ======== Training... 1000 steps done.... =========== the loss for epoch 6 is: 0.014146699178053905 Running Validation... 1000 steps done.... 2000 steps done.... 3000 steps done.... 4000 steps done.... 5000 steps done.... 6000 steps done.... 0.8713437521326691 ======== Epoch 8 / 10 ======== Training... 1000 steps done.... =========== the loss for epoch 7 is: 0.011553556534574357 Running Validation... 1000 steps done.... 2000 steps done.... 3000 steps done.... 4000 steps done.... 5000 steps done.... 6000 steps done.... 0.8730498873950727 ======== Epoch 9 / 10 ======== Training... 1000 steps done.... =========== the loss for epoch 8 is: 0.006559821181779955 Running Validation... 1000 steps done.... 2000 steps done.... 3000 steps done.... 4000 steps done.... 5000 steps done.... 6000 steps done.... 0.8745649355080871 ======== Epoch 10 / 10 ======== Training... 1000 steps done.... =========== the loss for epoch 9 is: 0.004438388840793665 Running Validation... 1000 steps done.... 2000 steps done.... 3000 steps done.... 4000 steps done.... 5000 steps done.... 6000 steps done.... 0.8751108987920563
Hello, I'm also trying to reproduce the results. Could you please share the pre-processed code that you mentioned in step 3? :)
Hi, thanks for sharing the code! I am trying to reproduce the results and have some problems.
Below are my steps to train a model on the DailyDialog corpus and evaluate it on the _DialSeg711 dataset.
data_process.py
to generate the training data. The paths indata_process.py
(text_path
,topic_path
, andact_path
) are set to the corresponding files in thedata
directory of this repo.model.py
to train a coherence scoring model with the training data generated above.test.py
, so I did some pre-processing. Each dialogue is saved as one file, where each line refers to an utterance, and the topics in one dialogue are separated by a line of================
, according to the code intest.py
test.py
and get the numbers.It seems different from the numbers in the paper.
This is the detail of training.