Why did the Dual gradient collapse on my own Chinese dataset?

hiyouga / Dual-Contrastive-Learning

Code for our paper "Dual Contrastive Learning: Text Classification via Label-Aware Data Augmentation"

https://arxiv.org/abs/2201.08702

MIT License

156 stars 28 forks source link

Why did the Dual gradient collapse on my own Chinese dataset? #10

Open wangqian97 opened 2 years ago

wangqian97 commented 2 years ago

Dear author, your framework is valid on the English dataset, but when I used dual-loss deficiency on my Chinese dataset, gradient collapse occurred. My Chinese label is two characters, is it related to this? Or do I have to adjust somewhere? Thank you very much. Look forward to hearing from you soon

hiyouga commented 2 years ago

Hi, we should use a single word to tokenize each label in DualCL. I conjecture that if the two-character Chinese label is encoded by two or more tokens, the DualCL loss will perform abnormally. Consider adding the whole label to the dictionary or alerting the label to a single (Chinese) character.