stanford-futuredata / ColBERT

ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)
MIT License
2.95k stars 377 forks source link

Allocate all the memory at once when coalescing residuals #271

Closed s-jse closed 11 months ago

s-jse commented 11 months ago

@santhnm2 proposed this change to fix a memory issue I ran into:

I've tested it on a 26-million-passage Wikipedia index which is about 120GB on disk. Before this change, the coalescing script would get killed, perhaps due to running out of RAM on a VM with 220GB RAM. After this change, coalescing finishes in 5 minutes instead of 50 minutes, and RAM utilization stays at around 38%.