最后测试时候为什么不直接用分类呢？

shanengcn commented 3 years ago

请问一下，在训练完之后，得到了bert分类模型，类别数是self.num_labels。在测试时，我看代码中是这样处理：先用bert提取测试集的embedding，然后用kmeans聚类，类数是self.num_labels，得到聚类结果以后去和真实标签计算最优匹配。那么请问测试时为什么不直接对测试集分类呢？直接分类应该也可以得到对应的label

HanleiZhang commented 3 years ago

您好，我来解释一下您的疑问~ 测试集中包括全部意图标签（已知意图+新意图），我们提出的新意图发现问题本质也是一个约束聚类问题，由于新意图样本标签未知，所以只能计算聚类相关评价指标，无法直接按照分类问题的方式计算准确率ACC。具体来说，聚类算法得到的ID是随机排序的，能通过它们判断样本是不是属于同一簇，但是不能识别出真实的标签是什么（因此聚类问题经常算的指标是NMI、ARI、轮廓系数之类用来簇聚合度质量的评价指标）。为了评价准确率，这里我们仿照论文[1, 2, 3]的做法，先将样本预测ID和真实标签通过匈牙利算法对齐，然后再计算准确率，这样能获得较高的准确率结果，同时判断重建簇的质量，该方法也是聚类问题计算ACC的常见做法。

Hello, let me explain your concerns~ The test sets contain all intents (known and new intents). Our proposed new intent discovery problem is a constrained clustering problem essentially. As the true labels of new intents are unknown, we can only calculate clustering metrics rather than computing accuracy directly as classification. Specifically, the obtained IDs after clustering are random. Therefore, we can only judge whether two samples are in one cluster, rather than identify the specific class (So typical clustering metrics are NMI, ARI, Silhouette Coefficient, and so on). To evaluate ACC, we follow the previous works [1, 2, 3]. We first use the Hungarian algorithm to align the predicted IDs with the true labels, then, we calculate ACC as results. In this case, we can achieve high ACC and evaluate the quality of the reconstructed clusters. It is also a common way to calculate ACC in clustering techniques.

References: [1] Yen-Chang Hsu, Zhaoyang Lv, Joel Schlosser, Phillip Odom and Zsolt Kira. 2019. Multi-class classification without multi-class labels. In Proceedings of ICLR 2019. [2] Yen-Chang Hsu and Zhaoyang Lv and Zsolt Kira. 2018. Learning to Cluster in Order to Transfer Across Domains and Tasks. In Proceedings of ICLR 2018. [3] Yi Yang, Dong Xu, Feiping Nie, Shuicheng Yan, and Yueting Zhuang. 2010. Image clustering using local discriminant models and global integration. IEEE Transactions on Image Processing.

shanengcn commented 3 years ago

@HanleiZhang 明白了，多谢~

thuiar / DeepAligned-Clustering

最后测试时候为什么不直接用分类呢？ #1