Open zhangj1an opened 2 years ago
This repository is much different from what their paper said.
Taking the TextTilling algorithm as an example, they claimed that they use a threshold to determine the segmentation points, whereas they took top-3 depth scores as the segmentation points in the source codes. The sampling strategy for generating pretraining data is also different from the original paper.
This repository is much different from what their paper said.
Taking the TextTilling algorithm as an example, they claimed that they use a threshold to determine the segmentation points, whereas they took top-3 depth scores as the segmentation points in the source codes. The sampling strategy for generating pretraining data is also different from the original paper.
@Coldog2333, sorry for the issue in the text.py... I have fixed the code to make prediction based on threshold instead of top-3, which is a setting used for testing from my side. Also, do you mind point out where is the difference about data generation between the original paper and the code data_process.py?
Many thanks!
Hello, glad to receive your reply.
In your original paper, it states that when generating training samples for Coherence Scoring Model (CSM), it will only randomly select those utterances with different Dialogue Act. However, in data_process.py, it doesn't care about it.
As for the threshold-based TextTilling method, I have also tried this strategy as your paper said, but achieved a quite low performance.
Hello, glad to receive your reply.
In your original paper, it states that when generating training samples for Coherence Scoring Model (CSM), it will only randomly select those utterances with different Dialogue Act. However, in data_process.py, it doesn't care about it.
As for the threshold-based TextTilling method, I have also tried this strategy as your paper said, but achieved a quite low performance.
@Coldog2333, thanks for pointing out the mismatch about the dialogue act constraint between the paper and our code. I have updated data_process.py to filter out the utterances with act label = 1 or 4 to apply the dialog act constraint as described in our paper. This constraint is only for the dialogue corpus with available dial act annotations and should be removed for the corpus without dialog act labels.
Furthermore, we are currently working on extending this framework by integrating other available knowledge and noticed that with different runs of training process, the static TextTiling threshold defined as mu-0.5 sigma made the model's performance varied significantly. We re-checked the paper proposing TextTiling and found out they mentioned to compute threshold with both mu-1sigma or mu-0.5*sigma. Thus, our current strategy is treating the coefficient of sigma as a tunable parameter. We can sample a small batch of dialogues (this batch should be eliminated from the data for training data generation) from the corpus and stitch parts from different dialogues to create a validation dataset with artificial segment labels (similar to the dataset used in Freddy Y. Y. Choi. 2000 "Advances in domain independent linear text segmentation."). Then we tune the coefficient of sigma on this validation dataset. We will refine the code and merge it with the repo of our next work when it's publicly available.
Many Thanks!
do you mind sharing the trained model.pt for model.py? I was running it on google colab but it takes 20+hours to finish.
Thanks!
Can you please share google colab notebook, that would be great help!!! I tried to replicate result but new to torch, unable to do that Thank you @zhangj1an !!!
Would you be able to provide the pretrained model? Would really appreciate it!
do you mind sharing the trained model.pt for model.py? I was running it on google colab but it takes 20+hours to finish.
Thanks!