Is there a particular reason to not support batch processing?

Every 2.0s: nvidia-smi                                   d26b4303cee2: Tue Jul 16 21:03:36 2024

Tue Jul 16 21:03:36 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-SXM4-40GB          Off | 00000000:00:04.0 Off |                    0 |
| N/A   43C    P0             149W / 400W |  20439MiB / 40960MiB |     52%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

I run LLaMA 2 on A100 GPU (on Google Colab, so maybe the environment is not perfect) and get 50% utilization.

I can try to implement batching myself, but need some advice on what to avoid and what not to break.

nvtransfer / RULER

Is there a particular reason to not support batch processing? #39