Open cramraj8 opened 1 year ago
Good question @cramraj8 ! (cc @seanmacavaney)
Our setup is driven by Docker environments with primarily single GPU access. I know the underlying Colbert codebase can do distributed indexing, but we haven't integrated that.
Can you print the contents of from colbert.parameters import DEVICE
and tell us which device is associated with indexer.colbert
?
The relevant actual model encoding line is https://github.com/cmacdonald/ColBERT/blob/v0.2/colbert/modeling/inference.py#L30
I wonder if you can wrap indexer.colbert
in torch.nn.DataParallel
, as per https://www.run.ai/guides/multi-gpu/pytorch-multi-gpu-4-techniques-explained#Technique-1 and https://stackoverflow.com/a/64825728. Probably you can increase the value of indexer.args.bsize
if you do?
Please let us know how you get on.
Craig
any update @cramraj8
I tried to run colbert indexing on trec-deep-learning-passages, and my environment is available with 4 GPUs. But when I call the APIs for indexing such as below it only utilizing 1 GPU.
How can I leverage all 4 GPUs with pyterrier API ?