Closed DidiDerDenker closed 3 years ago
cc @patrickvonplaten
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
@patrickvonplaten Hi, unfortunately I have not been able to make any progress in the last month and would appreciate if you have a solution for the unexpected behavior. Thank you! :)
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Hey @DidiDerDenker,
Sorry it's very difficult to debug customized training runs that don't produce good results for us. Could you instead try to use the forum: https://discuss.huggingface.co
Environment info
Who can help
Information
I am currently working on abstractive text summarization. In the process I am trying to fine-tune BART on german text data. This works i.e. with bert-base-multilingual-cased and bert-base-german-cased. This does not work with i.e. deepset/gbert-base, deepset/gelectra-large and mbart-large-cc25. The training is not making any progress. The loss converges to zero very quickly. Am I doing something wrong? Do I need to use other classes?
To reproduce
Here are a few code snippets to reproduce this behavior:
Expected behaviour
I would like to fine-tune BART profitably.