Open wangqian97 opened 2 years ago
Hi, we should use a single word to tokenize each label in DualCL. I conjecture that if the two-character Chinese label is encoded by two or more tokens, the DualCL loss will perform abnormally. Consider adding the whole label to the dictionary or alerting the label to a single (Chinese) character.
Dear author, your framework is valid on the English dataset, but when I used dual-loss deficiency on my Chinese dataset, gradient collapse occurred. My Chinese label is two characters, is it related to this? Or do I have to adjust somewhere? Thank you very much. Look forward to hearing from you soon