weaviate / t2v-transformers-models

This is the repo for the container that holds the models for the text2vec-transformers module
BSD 3-Clause "New" or "Revised" License
40 stars 27 forks source link

Support PyTorch `set_per_process_memory_fraction` #39

Open kcm opened 1 year ago

kcm commented 1 year ago

Summary

PyTorch allows a limit for GPU memory. This is useful, for example, when a GPU resource is shared.

set_per_process_memory_fraction(fraction, device=None): Set memory fraction for a process. The fraction is used to limit an caching allocator to allocated memory on a CUDA device. The allowed value equals the total visible memory multiplied fraction. If trying to allocate more than the allowed value in a process, will raise an out of memory error in allocator.

Proposal

This setting takes a percentage [0-1] and a device (optional). Use an environment variable alongside ENABLE_CUDA of the format CUDA_MEMORY_FRACTION where the value is 0.0-1.0 and passed to fraction. Additionally, if set, check and prefer CUDA_MEMORY_FRACTION_... variable(s), where the value is the same format, and the ... is passed to device for each variable found.

Questions

kcm commented 1 year ago

One use case is for AWS vGPU support so that multiple consumers of the vGPU device(s) don't assume they have exclusive rights to the full resource usage.