jyaacoub / MutDTA

Improving the precision oncology pipeline by providing binding affinity purtubations predictions on a pirori identified cancer driver genes.
1 stars 2 forks source link

Look into how to use DeepSpeed for inference instead of FSDP #93

Closed jyaacoub closed 2 months ago

jyaacoub commented 2 months ago

Relevant links:

Main issues with this are dependency issues with mpi4py on narval... Might need to create container since it requires specific version of openmpi not available on narval

jyaacoub commented 2 months ago

Other than dependency (ModuleNotFound) errors AND issues with AutoTP (Automatic Tensor Parallel wrapping biases for attention heads) the only other thing that needs to be adjusted for deepspeed to work is the following error:

AttributeError: module 'deepspeed.utils' has no attribute 'is_initialized'. Did you mean: 'initialize'?

jyaacoub commented 2 months ago

DeepSpeed AutoTP

It is not sustainable to hot fix each instance of shape mismatching, I think I should switch gears and look at adjusting AutoTP to properly recognize which modules are good or not to wrap and distribute to gpus.

In the DeepSpeed code, AutoTP is used in deepspeed/inference/engine.py:<InferenceEngine.__init__.py>

replace_with_kernel_inject

Now testing with 2 V100s also with only 16GB of VRAM and we can run sequences of 624 in length!

Compared to optimal performance (2x) the replace_with_kernel_inject=True argument is 1.6x which is 80% optimal

What is Kernel injection?

ChatGPT answer:

peak memory usage with kernel injections on 2xa100

Max mem for a100 is: 40960MiB

seqLen peakMem
872 26906 MiB

Assuming linear scaling the max sequence length if peak mem is equal to max a100 is 1327. and this would only be for 2a100s.

jyaacoub commented 2 months ago

Using Low-memory attention with chunk_size can help trade off compute for memory. See https://github.com/aqlaboratory/openfold?tab=readme-ov-file#monomer-inference.