Why not "next_strategy_id" used in Blender mode code?

nanzhao commented 3 years ago

I see "next_strategy_id" is not used in the Emotional-Support-Conversation/codes/src/transformers/models/blenderbot_small/modeling_blenderbot_small.py.

Except for useless code for role and turn id, I can't tell more code differences between codes/src/transformers and huggingface transformers.

That is to say, I can successfully reproduce your result only by data preprocess adding strategy. Right?

lsy641 commented 3 years ago

Hi. Partially true. You can successfully train a model only by data preprocess adding strategy. No difference between using codes/src/transformers or huggingface transformers in the modeling process.

next_strategy_id is used in the code if you'd like to generate a response conditioned on a designated strategy. If you reproduce the Joint model, it is not necessary.

Role id and turn id are useless. I tried to use this information but didn't get significant improvement.

Except for the above difference, I think there is another different place. I changed the way of penalizing repetitive tokens (mentioned in the appendix of the paper). That should be seen in the generation part.
I will show you where the code is later.

lsy641 commented 3 years ago

https://github.com/thu-coai/Emotional-Support-Conversation/blob/b0c98059edeb0607f762f62587c09f17fda2a2c1/codes/src/transformers/generation_utils.py

In line 1392, I changed the logits_processor's inputs

 if encoder_input_ids is not None:
     next_token_scores = logits_processor(torch.cat([encoder_input_ids, input_ids],dim=-1), next_token_logits)
    #next_token_scores = logits_processor(input_ids, next_token_scores)

lsy641 commented 3 years ago

You may see magic numbers in the https://github.com/thu-coai/Emotional-Support-Conversation/blob/b0c98059edeb0607f762f62587c09f17fda2a2c1/codes/src/transformers/generation_utils.py, such as 50257 or 54945. They are the size of the vocab that is either used by Blender or DialoGPT.

BTW, in the original implementation, I forgot to give masks to padding tokens in the encoder, which may make a little difference between your produced results and the paper's results. I have corrected the code in the file.

Happy to reply to you if you have any other questions.

nanzhao commented 3 years ago

I thought args.strategy is a mode option for Vanilla or Joint/Oracle.

But now, I think I understand your implementation with your explanations above. args.strategy == True, means it works as Oracle mode. args.strategy == False, means it works as Joint mode.

So I close this ticket with no issue.

lsy641 commented 3 years ago

I thought args.strategy is a mode option for Vanilla or Joint/Oracle.

But now, I think I understand your implementation with your explanations above. args.strategy == True, means it works as Oracle mode. args.strategy == False, means it works as Joint mode.

So I close this ticket with no issue.

Sorry. Your original thought is correct. The oracle model and joint model are the same model in training. So they all use strategy. They are only different in the generation.

arg.strategy == True, means it works as the models with strategy. Especially, when generating, you should put next_strategy_id and cls_position (the position of the strategy you want to add) into the dict params and run model.generate(input_ids, **params, ..., ) if you want to give a designated strategy.

arg.strategy ==False, means it works as the vanilla model.

In training, there is no necessity that models use next_strategy_id (only used to designating a strategy in generation) because strategy tokens are already in inputs_ids and label_ids.

I noticed there is also an input parameter next_strategy_ids in modeling. It seems I created it for a classification head. When I found the classification head doesn't carry much improvement, I give up using it as the same as I treat role_id and turn_id. But next_strategy_id is used in the generation.

Happy to hear from you if you have more questions.

thu-coai / Emotional-Support-Conversation

Why not "next_strategy_id" used in Blender mode code? #1