Closed salokr closed 1 year ago
Hey! Thanks for reporting, however urgent this is, please refrain from pinging as many people as that.
All the questions related to how to train
or improve my training
should be asked on the forum, as they are not bugs and the community is more adept to help you there.
Hi,
I am trying to train a ByT5 model for text2text generation specifically, given previous chat history the objective is to produce a response for the input. I understand that I can use decoder-only models for the task, but we need to use the byte-level information which we will be using in the future. For training purposes, I have obtained a dataset for fine-tuning and used the following configuration:
My code to fine-tune looks like the following:
However, the problem with the above code is after a lot of fine-tuning the model generates text which is repeated again and again and sometimes copies from the input or generates responses that are not relevant or related to the input. I have tried contrastive search, beam search, etc. also but the response generated by the model is still gibberish. Any suggestions on how to improve ByT5's capability to do the task? As I understand, T5-based models (or ByT5) perform well on many seq2seq tasks such as Text2SQL, etc. so they should at least generate relevant responses to the input for this task too.
Please let me know, any suggestions you have. @ArthurZucker @younesbelkada
I am also attaching some sample responses generated by the model.