Closed ViktorooReps closed 3 months ago
Every 2.0s: nvidia-smi d26b4303cee2: Tue Jul 16 21:03:36 2024 Tue Jul 16 21:03:36 2024 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA A100-SXM4-40GB Off | 00000000:00:04.0 Off | 0 | | N/A 43C P0 149W / 400W | 20439MiB / 40960MiB | 52% Default | | | | Disabled | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| +---------------------------------------------------------------------------------------+
I run LLaMA 2 on A100 GPU (on Google Colab, so maybe the environment is not perfect) and get 50% utilization.
I can try to implement batching myself, but need some advice on what to avoid and what not to break.
Batching is now supported for HF wrapper
I run LLaMA 2 on A100 GPU (on Google Colab, so maybe the environment is not perfect) and get 50% utilization.
I can try to implement batching myself, but need some advice on what to avoid and what not to break.