Summary

PyTorch allows a limit for GPU memory. This is useful, for example, when a GPU resource is shared.

set_per_process_memory_fraction(fraction, device=None): Set memory fraction for a process. The fraction is used to limit an caching allocator to allocated memory on a CUDA device. The allowed value equals the total visible memory multiplied fraction. If trying to allocate more than the allowed value in a process, will raise an out of memory error in allocator.

Proposal

This setting takes a percentage [0-1] and a device (optional). Use an environment variable alongside ENABLE_CUDA of the format CUDA_MEMORY_FRACTION where the value is 0.0-1.0 and passed to fraction. Additionally, if set, check and prefer CUDA_MEMORY_FRACTION_... variable(s), where the value is the same format, and the ... is passed to device for each variable found.

Questions

[ ] Is there a better name than CUDA_MEMORY_FRACTION/CUDA_MEMORY_FRACTION_...?
[ ] Do we need multiple device support initially? We don't seem to currently support device selection.
[x] Is this supported on our current PyTorch (1.13)?

weaviate / t2v-transformers-models

Support PyTorch `set_per_process_memory_fraction` #39

Summary

Proposal

Questions