Standard batching vs NCE?

Hi! Thanks so much for providing this implementation of the MTB training strategy.

I noticed that in the paper the authors use noise contrastive estimation in the training scheme, whereas in this implementation there seems to be an internal batching flag within the dataloader in preprocessing_funcs, which enables noise contrastive estimation.

See here: https://github.com/plkmo/BERT-Relation-Extraction/blob/06075620fccb044785f5fd319e8d06df9af15b50/src/preprocessing_funcs.py#L287

Is there a reason for this decision? Has anyone tried using standard batching rather than NCE?

I'll also try standard batching myself and update this thread if I have any meaningful results.

plkmo / BERT-Relation-Extraction

Standard batching vs NCE? #46