fasttext支持只有正例样本的训练分类

mayabot / mynlp

一个生产级、高性能、模块化、可扩展的中文NLP工具包。（中文分词、平均感知机、fastText、拼音、新词发现、分词纠错、BM25、人名识别、命名实体、自定义词典）

https://mynlp.mayabot.com/

Apache License 2.0

678 stars 89 forks source link

fasttext支持只有正例样本的训练分类 #30

Open Strong-Gavin opened 4 years ago

Strong-Gavin commented 4 years ago

fasttext训练时需要指定至少两个分类标签，假如我只有正例，新来的样本判断是不是该分类能不能支持我试了一下发现无论输入什么单样本的训练出来的模型预测都是100%

jimichan commented 4 years ago

原理上你只能根据预测结果的打分来控制，比如超过0.9分才算正确

发自我的iPhone

------------------ 原始邮件 ------------------ 发件人: Gavin <notifications@github.com> 发送时间: 2020年9月14日 12:02 收件人: mayabot/mynlp <mynlp@noreply.github.com> 抄送: Subscribed <subscribed@noreply.github.com> 主题: 回复：[mayabot/mynlp] fasttext支持只有正例样本的训练分类 (#30)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Strong-Gavin commented 4 years ago

原理上你只能根据预测结果的打分来控制，比如超过0.9分才算正确发自我的iPhone … ------------------ 原始邮件 ------------------ 发件人: Gavin <notifications@github.com> 发送时间: 2020年9月14日 12:02 收件人: mayabot/mynlp <mynlp@noreply.github.com> 抄送: Subscribed <subscribed@noreply.github.com> 主题: 回复：[mayabot/mynlp] fasttext支持只有正例样本的训练分类 (#30) fasttext训练时需要指定至少两个分类标签，假如我只有正例，新来的样本判断是不是该分类能不能支持我试了一下发现无论输入什么单样本的训练出来的模型预测都是100% — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

我使用只有一个label的进行训练，然后接着无论输入什么进行预测，打分都是1，也就是100%,是样本问题吗，稍后我用酒店评论那个只留下积极的试试

Strong-Gavin commented 4 years ago

试验了下用酒店只保留pos标签的数据，训练的模型对后来数据进行分类，也是1 如图

jimichan commented 4 years ago

那我的理解就是错误的，可能fasttext不满足需求。你再试试把Loss改成hs或者softmax试试。你这个需求太特殊了，因为ns是负采样，没有负样本采样肯定不行