Closed pommedeterresautee closed 4 years ago
Thank you for your comments. Queries and documents are represented by [batch_size, max_len] in data_loader. In both K-NRM and Conv-KNRM, they calculate the interaction matrix for ranking. Through embedding layer, queries are expanded as [batch_size, max_len, 1, embedding_size] and documents are expanded as [batch_size, 1, max_len, embedding_size]. And in Conv-KNRM, all n-grams are feed into CNN and there is a sqeeze function. So max_len can't be 1 in Conv-KNRM. If you have other question, you can send me your bug report. Best wishes.
Ok I got it !!! Makes sense. So may be to make it more robust pad_to_longest should guarantee that max_len is always > 3 including when max len is computed per batch (because CNN is set to tri grams).
if max_len == 0:
max_len = max(max(len(inst) for inst in insts), 4)
Sure, I will fix it later. Thank you.
Hi, for more neural IR training, data augmentation and more about EDRM. Please refer to our WWW2020 Paper Selective Weak Supervision for Neural Information Retrieval. Thank you for your attention. I will close this issue.
Hi,
I have noticed that a padding value is hard coded for tests documents. https://github.com/thunlp/EntityDuetNeuralRanking/blob/master/baselines/DataLoader.py#L131
When I remove it there it crashes. Do you have an idea why? It seems related to the squeeze op. A way to workaround this bug is to guarantee a minimum value for the query (which should be >= 4, don t ask me why, just made some tests)