white127 / QA-deep-learning

tensorflow and theano cnn code for insurance QA(question Answer matching)
531 stars 283 forks source link

关于InsuranceQA训练语料转,负样本采样 #25

Open Fyweven opened 5 years ago

Fyweven commented 5 years ago

代码中训练数据的获取接口是: utils.gen_train_batch_qpn(train_data, FLAGS.batch_size) 但是在该函数中 def gen_train_batch_qpn(_data, batch_size): psample = random.sample(_data, batch_size) nsample = random.sample(_data, batch_size) q = [s1 for s1, s2 in psample] qp = [s2 for s1, s2 in psample] qn = [s2 for s1, s2 in nsample] return np.array(q), np.array(qp), np.array(qn) psample和nsample获取方式一样??

zemu121 commented 5 years ago

我也有同样的疑问,你明白了吗?

zemu121 commented 5 years ago

train_data中只有qp,没有qn吧

Fyweven commented 5 years ago

train_data中只有qp,没有qn吧

直接从所有问题中随机选择了一个,作为负样本

zemu121 commented 5 years ago

train中的数据都是正样本,nsample也是从train中随机选取的,所以qn其实也是正确的答案?

Fyweven commented 5 years ago

train中的数据都是正样本,nsample也是从train中随机选取的,所以qn其实也是正确的答案? 并不是,是所有的问题,也有可能采样到正样本,但是概率很低,大概率是qn

zemu121 commented 5 years ago

我明白了,非常感谢你的回复。

zemu121 commented 5 years ago

如果将模型变成一般处理图像的模型,就是利用小滑窗,多次卷积,max_pooling,你觉得可行吗