memray / seq2seq-keyphrase

MIT License
318 stars 109 forks source link

About training dataset #33

Closed shizhediao closed 5 years ago

shizhediao commented 5 years ago

请问,在您论文里section4.2提到的: the remaining papers are used to train the supervised baselines. 怎么理解?是其余四个小数据集也划分出来了train和test来训练KEA和Maui吗? 为什么不用kp20k training data 去训练?是memory limit?

memray commented 5 years ago

那几个数据集中有几个并没有划分train/test,所以只能拿kp20k的一部分来训练。如果有train split就直接用这个train了。我们试过10w个doc来训练,如你所说it broke due to out-of-memory,然后就用2w个doc来训练了,后来也试过4w个来训练但是性能几乎没有影响。