Hi! Thanks so much for providing this implementation of the MTB training strategy.
I noticed that in the paper the authors use noise contrastive estimation in the training scheme, whereas in this implementation there seems to be an internal batching flag within the dataloader in preprocessing_funcs, which enables noise contrastive estimation.
Hi! Thanks so much for providing this implementation of the MTB training strategy.
I noticed that in the paper the authors use noise contrastive estimation in the training scheme, whereas in this implementation there seems to be an internal batching flag within the dataloader in preprocessing_funcs, which enables noise contrastive estimation.
See here: https://github.com/plkmo/BERT-Relation-Extraction/blob/06075620fccb044785f5fd319e8d06df9af15b50/src/preprocessing_funcs.py#L287
Is there a reason for this decision? Has anyone tried using standard batching rather than NCE?
I'll also try standard batching myself and update this thread if I have any meaningful results.