Open TianlinZhang668 opened 2 years ago
There are two approaches to tackling this problem. We can take the mean-pooling on the BERT outputs to obtain the label representations. Alternatively, we can also take the first word in the label if it does not conflict with other labels.
For the labels containing multiple words, How to take the mean-pooling? Is there this code in files?