Open JhonDan1999 opened 1 year ago
I observed the same thing, I also tried penalizing the fact of generating the '_' token directly in the reward function. Unfortunately, it does not seems to learn how to stop generating the blank token...
Hi all, the issue probably cause by https://github.com/huggingface/transformers/blob/bffac926ca6bc6c965a92bfbfd00c567a2c0fb90/src/transformers/models/t5/modeling_t5.py#L1147C8-L1147C8
it will add a position_bias after each layer output, so the initialize model will perform badly
Hey! Do you guys figure out a solution to this problem? Thanks!
Hey! Do you guys figure out a solution to this problem? Thanks!
Unfortunately not yet, I spend a lot of time trying to figure out a way to do it with this library but I ended up leaving it (at least currently)
Nice repo!!!
it seems that the default parameter for the policy will freeze all the layers of the language model we are using and just update the lm_head I tried the provided example of flan-T5 here: https://colab.research.google.com/drive/1DYHt0mi6cyl8ZTMJEkMNpsSZCCvR4jM1?usp=sharing
when I changed the value unfreeze_layer_from_past to be 1 to update the wights of the final layer of flan-t5 like this:
the behavior change the the actor starts to generate empty text:
Also after training it gave me empty text:
what is the reason of the this behavior?
NOTE: I did not change anything else in the flan-t5 code example.