Closed AbuUbaida closed 2 years ago
I am literally getting the same issue, invested hours trying to figure out what is wrong but still did not get it. I tried both BERTurk and mBERT and both of them gave the same issue and even training loss goes near 0 instantly I am not sure what to do.
Sorry for being late. Yes, that's because what approach you are following will work in the specific version more precisely, v4.2.1 but not in the current release (maybe 4.18).
Sorry for being late. Yes, that's because what approach you are following will work in the specific version more precisely, v4.2.1 but not in the current release (maybe 4.18).
Hii sorry but I am not sure what exactly to change can you be more specific? Should I not use Encoder Decoder class or should I change the trainer ? I am new to this!
Thank you in advance.
I believe he means to ensure your version of transformers and other relevant libraries match the example
I believe he means to ensure your version of transformers and other relevant libraries match the example
@AniketRajpoot exactly this is and the adjustments needed for version 4.18 are in progress. I have opened another issue. you can keep your eyes if you want.
Thank you so much @AbuUbaida @RaedShabbir, I understand the issue!
I just tried to train an EncoderDeoder model for summarization task based on pre-trained BanglaBERT, which is an ELECTRA discriminator model pre-trained with the Replaced Token Detection (RTD) objective. Surprisingly, after spending 4500 steps on 10k training data, the model wasn't trained a bit since the ROUGE-2 scores were just 0.0000. To make sure I used that 4500-checkpoint to generate summaries for testing purposes; it generated a fixed-length (50) output (even if I change the test input of different lengths) containing the [CLS] token 49 times and a [SEP] token lastly. Basically, I followed the Warm-starting encoder-decoder models with 🤗Transformers notebook. Can anybody give any clue what could be the issue here? Thanks in advance.
In my case,
Tokenization BanglaBERT model:
Input pre-processing function:
Mapping the pre-processing function to the batches of examples:
BanglaBERT model and its config settings:
I used the Seq2SeqTrainer for training. The Seq2SeqTrainingArguments were as follows: