stanford-futuredata / ColBERT

ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)
MIT License
2.67k stars 355 forks source link

Fail when no training triplets are present #312

Open JoshuaPurtell opened 4 months ago

JoshuaPurtell commented 4 months ago

Presently, when there are no triplets, the loop

for batch_idx, BatchSteps in zip(range(start_batch_idx, config.maxsteps), reader):

is never entered and so batch_idx is unbound here

ckpt_path = manage_checkpoints(config, colbert, optimizer, batch_idx+1, savepath=None, consumed_all_triples=True)

returning this error:

UnboundLocalError: cannot access local variable 'batch_idx' where it is not associated with a value

In order to circumvent this issue, I've added assertions that check to ensure there is indeed training data, and otherwise alert the user. Hopefully this should generate error information that's more informative.