Closed jyaacoub closed 2 months ago
Other than dependency (ModuleNotFound) errors AND issues with AutoTP (Automatic Tensor Parallel wrapping biases for attention heads) the only other thing that needs to be adjusted for deepspeed to work is the following error:
AttributeError: module 'deepspeed.utils' has no attribute 'is_initialized'. Did you mean: 'initialize'?
deepspeed.comm.comm.is_initialized()
in openfold/model/primitives.py
(see openfold issue page and commit hotfix)It is not sustainable to hot fix each instance of shape mismatching, I think I should switch gears and look at adjusting AutoTP to properly recognize which modules are good or not to wrap and distribute to gpus.
In the DeepSpeed code, AutoTP is used in deepspeed/inference/engine.py:<InferenceEngine.__init__.py>
replace_with_kernel_inject
will avoid using AutoTP and instead do kernel injection.
Recall input_19.csv
for platinum failed due to memory issues at a sequence length of 392
Now testing with 2 V100s also with only 16GB of VRAM and we can run sequences of 624 in length!
input_19.csv
is 624
Compared to optimal performance (2x) the replace_with_kernel_inject=True
argument is 1.6x which is 80% optimal
ChatGPT answer:
replace_with_kernel_inject
option.Max mem for a100 is: 40960MiB
seqLen | peakMem |
---|---|
872 | 26906 MiB |
Assuming linear scaling the max sequence length if peak mem is equal to max a100 is 1327. and this would only be for 2a100s.
Using Low-memory attention with chunk_size can help trade off compute for memory. See https://github.com/aqlaboratory/openfold?tab=readme-ov-file#monomer-inference.
Relevant links:
Main issues with this are dependency issues with mpi4py on narval... Might need to create container since it requires specific version of
openmpi
not available on narval