Encoder-Decoder model after fine tuning on Turkish dataset, generation gives the same results regardless of the input

huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

https://huggingface.co/transformers

Apache License 2.0

132.4k stars 26.37k forks source link

Encoder-Decoder model after fine tuning on Turkish dataset, generation gives the same results regardless of the input #17124

Closed AniketRajpoot closed 2 years ago

AniketRajpoot commented 2 years ago

Hello everyone, I need help with the training of the encoder-decoder model. I need to fine-tune a bert2bert for Turkish content summarization. I am using this sample notebook reference: https://github.com/patrickvonplaten/notebooks/blob/master/BERT2BERT_for_CNN_Dailymail.ipynb

After training on my custom dataset, when I generate using a test dataset, I get gibberish results regardless of the amount of training. I have attached the results below and one more observation I made is that the training loss instantly goes to near 0 values after a few steps of training I am not sure what I am doing wrong.

Here are the screenshots of output : output_1 output_2

Here is the training loss : train

Here is the full notebook that I used for finetuning : https://colab.research.google.com/drive/188Lil4Uc3wY7nd1PEfCjMwSfPO-NXI94?usp=sharing

I am not sure what I am doing wrong? I would be grateful for any advice. Thank you!

AbuUbaida commented 2 years ago

I highly recommend you get a workaround that works on the latest dataset and transformers library! I suffered a lot only for this reason.

AniketRajpoot commented 2 years ago

I highly recommend you get a workaround that works on the latest dataset and transformers library! I suffered a lot only for this reason.

What do you mean by workaround? I am not sure what are you referring to?

AbuUbaida commented 2 years ago

you can look for the necessary changes that might be needed for the latest version and let me know if you get any?

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

tqnwhz commented 2 years ago

Hi bro, I met similar problems like you. Have you found any solution? Thanks very much.

mareloraby commented 2 years ago

I’m having the same issue. Did someone figure it out? Thanks in advance.

Edit: Solved by using the following library versions:

transformers==4.2.1 datasets==1.0.2 torch==1.6.0

credits to @salma-elshafey

tqnwhz commented 2 years ago

I’m having the same issue. Did someone figure it out? Thanks in advance.

Edit: Solved by using the following library versions:

transformers==4.2.1 datasets==1.0.2 torch==1.6.0

credits to @salma-elshafey

It does not work for me :(. Could you please provide related links in your solution? Thanks very much. @mareloraby

mareloraby commented 2 years ago

I’m having the same issue. Did someone figure it out? Thanks in advance. Edit: Solved by using the following library versions: transformers==4.2.1 datasets==1.0.2 torch==1.6.0 credits to @salma-elshafey

It does not work for me :(. Could you please provide related links in your solution? Thanks very much. @mareloraby

Hey @tqnwhz, sorry I don’t have a reference I can link. My colleague who worked with an Encoder-Decoder model before helped me with that

AniketRajpoot commented 2 years ago

Hello @tqnwhz! I hope you are doing well.

Can you take a look at following issues :

If these doesn't solve check out this blog : https://huggingface.co/blog/warm-starting-encoder-decoder#data-preprocessing

ydshieh commented 2 years ago

@AniketRajpoot Do you have no more issue with new transformer's version(s)?

AniketRajpoot commented 2 years ago

Actually I did not train the same model but rather a different generation model based on codeBERT. But it was having some different issues. But it did solve the problem for random output and loss going instantly zero.

tqnwhz commented 2 years ago

Thanks very much! @AniketRajpoot I'll conduct a few experiments to verify them.

AbuUbaida commented 2 years ago

@tqnwhz you could try this setting: transformers==4.18.0 datasets==2.1.0

tqnwhz commented 2 years ago

@AbuUbaida Thanks for your advice. I've tried this setting and it does not work :(. Given the fact that I've spent about two weeks trying to solve this but in vain. I plan to turn to other approaches rather than seq2seq.

Thanks again for your advices sincerely. Hope you everything goes well.

ydshieh commented 2 years ago

Hi @tqnwhz , could you provide the script and the datasets that could reproduce this issue. As this issue seems to happen a few times, I think it would be great if we can find the cause and fix it. But I need something that could reproduce it 🙏 Thank you.

tqnwhz commented 2 years ago

Hi @ydshieh , I'm afraid that my code and dataset are not typical for this problem, since I try to use seq2seq to model multi-label text classification, rather than normal text generation task.

ydshieh commented 2 years ago

OK, no problem!