关于InsuranceQA训练语料转，负样本采样 - Githubissues

white127 / QA-deep-learning

tensorflow and theano cnn code for insurance QA(question Answer matching)

531 stars 283 forks source link

关于InsuranceQA训练语料转，负样本采样 #25

Open Fyweven opened 5 years ago

Fyweven commented 5 years ago

代码中训练数据的获取接口是： utils.gen_train_batch_qpn(train_data, FLAGS.batch_size) 但是在该函数中 def gen_train_batch_qpn(_data, batch_size): psample = random.sample(_data, batch_size) nsample = random.sample(_data, batch_size) q = [s1 for s1, s2 in psample] qp = [s2 for s1, s2 in psample] qn = [s2 for s1, s2 in nsample] return np.array(q), np.array(qp), np.array(qn) psample和nsample获取方式一样？？

zemu121 commented 5 years ago

我也有同样的疑问，你明白了吗？

zemu121 commented 5 years ago

train_data中只有qp，没有qn吧

Fyweven commented 5 years ago

train_data中只有qp，没有qn吧

直接从所有问题中随机选择了一个，作为负样本

zemu121 commented 5 years ago

train中的数据都是正样本，nsample也是从train中随机选取的，所以qn其实也是正确的答案？

Fyweven commented 5 years ago

train中的数据都是正样本，nsample也是从train中随机选取的，所以qn其实也是正确的答案？并不是，是所有的问题，也有可能采样到正样本，但是概率很低，大概率是qn

zemu121 commented 5 years ago

我明白了，非常感谢你的回复。

zemu121 commented 5 years ago

如果将模型变成一般处理图像的模型，就是利用小滑窗，多次卷积，max_pooling，你觉得可行吗