想问一下多标签是怎么处理的？跑多标签数据集的时候support值好像总和等于那些只有一个标签的

yongzhuo / Pytorch-NLU

Pytorch-NLU，一个中文文本分类、序列标注工具包，支持中文长文本、短文本的多类、多标签分类任务，支持中文命名实体识别、词性标注、分词、抽取式文本摘要等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of spee

https://blog.csdn.net/rensihui

Apache License 2.0

328 stars 52 forks source link

想问一下多标签是怎么处理的？跑多标签数据集的时候support值好像总和等于那些只有一个标签的 #2

Closed chandeler closed 2 years ago

chandeler commented 2 years ago

以给定的school为例，测试集一共有132条，其中多标签的12条，单标签的120条，最后support和为120. 我单步调试看了一下，多标签的样本输入时one hot向量是全零的，想问一下这一步什么原理？

yongzhuo commented 2 years ago

已修复, 多标签分隔符参数未更新。修改 tcRun.py 中的数据预处理函数， corpus.preprocess(xs_tet, self.config.l2i, max_len=self.config.max_len) 改为 corpus.preprocess(xs_tet, self.config.l2i, max_len=self.config.max_len, label_sep=self.config.label_sep)