guxd / DialogBERT

Source Code for DialogBERT: Discourse-Aware Response Generation via Learning to Recover and Rank Utterances (https://arxiv.org/pdf/2012.01775.pdf)
Other
75 stars 14 forks source link

Reproducing results from the paper and hyperparameters #15

Open paul-ruban opened 2 years ago

paul-ruban commented 2 years ago

Hi,

I'm trying to reproduce the results you reported in the paper and unable to do so with the set of current hyperparameters. One notable problem is with per_gpu_eval_batch_size=1. Keeping it as is takes a long time to do evaluation, but when I set it to a value > 1, the code breaks. I figured that might have something to do with the generate method of DialogBERT class. Here, for example

generated = torch.zeros((numsamples,1), dtype=torch.long, device=device).fill(self.tokenizer.cls_token_id)

[batch_sz x 1] (1=seq_len)

num_samples is used as batch_sz? I'm wondering if this is intended, or a typo, because when I change num_samples to batch_sz for generated tokens the code works. However when the generated text shapes up, it doesn't seem to match the context it is generated from.

Could you please share the hyperparameters you used and help solve per_gpu_eval_batch_size=1 problem.

Thanks

guxd commented 2 years ago

Hi,thanks for your interest! We did not generate multiple samples in our experiments. The batch size is by default 1. You can set to evaluate after a certain number of iterations.

paul-ruban commented 2 years ago

Great, how about the set of hyperparameters. Are currently set default flags used for your experiments in the paper?

guxd commented 2 years ago

Yes, they are used in our experiments.

paul-ruban commented 2 years ago

To follow up on the training setting, I'm still unable to reproduce your results using the default hyperparameters. In the requirements file there a few missing libraries, that I install myself, but I guess the versioning might be different. For instance, the meteor score is not working with my setting, and I have to put the arguments in a list etc. Also, my BLEU and NIST scores don't exceed 0.5. Would you mind updating the requirements.txt file with the versions of the nltk and other required libraries or provide the description of your environment (I mean packages). Thanks

guxd commented 2 years ago

Hi, we use NLTK 3.5 in our packages. Other libraries: 'numpy', 'protobuf', 'six', 'tables', 'tensorboardX', 'tqdm', 'sentencepiece', 'tokenizers', 'dataclasses', 'huggingface_hub',