Closed ljleb closed 1 year ago
Is the idea to keep the model in RAM and move only one block at a time to VRAM?
Yes. Move the keys only on the work device when we are about to do work (i.e. merge stuff together)
On my system this allows to load more models into memory while still getting a speedup with some merge methods.
On my system this allows to load more models into memory while still getting a speedup with some merge methods.
I see
is that as fast as loading the entire model on vram? Possibly the difference is negligible
Not as fast but comparable in some cases. For example I do get good speedup for methods that use sorting. I'll run another test to compare ties on full gpu vs work-only gpu.
And I guess that when you set both to gpu is like now. Load everything at once and merge
Wait I made a mistake. I intended the default value for --work-device
to be the device specified by --device
.
With ties add difference, it is ~2x faster to use --device cuda
:
stage 1: 100%|██████████| 1131/1131 [00:14<00:00, 77.05it/s]
With --work-device cuda
:
stage 1: 100%|██████████| 1131/1131 [00:30<00:00, 37.56it/s]
With --device cpu
(or no cli flags):
stage 1: 100%|██████████| 1131/1131 [01:18<00:00, 14.44it/s]
Add the possibility to use different devices for storing vs merging keys. I get ~2x speedup for
ties_add_difference
:Using
--work-device cpu
:Using
--work-device cuda
: