Open bghira opened 1 month ago
That's awesome, thanks for sharing @bghira! How fast was inference on your local machine?
it gets slower as the sample size increases but this test script takes about 10 seconds to run on an M3 Max.
I got this working as well! Inference time seems to increase more than linearly with prompt size
I think the reason is that itself takes a surprising amount of memory — loading the model takes the expected ~3GB of memory, but then inference takes 15 GB on top of that, which is probably what's slowing it down on my machine (16GB M2).
I got this working as well! Inference time seems to increase more than linearly with prompt size
- 3 seconds of audio: 10 seconds of generation
- 8s of audio: ~90 seconds of generation
- 10 of audio: ~3min of generation
I think the reason is that itself takes a surprising amount of memory — loading the model takes the expected ~3GB of memory, but then inference takes 15 GB on top of that, which is probably what's slowing it down on my machine (16GB M2).
Swapping activated? I will try on Mac Mini M2 (24GB). Do we know the performance on CUDA on similar machine?
on the 128gb M3 Max i can get pretty far into the output window before the time increases to 3 minutes.
it'll take about a minute for 30 seconds of audio.
of
I am getting, 2s of audio: 11 seconds and 6s of audio: 36 seconds
my data , on 64G M2 Max | seconds of audio | cpu(seconds of generation) | mps(seconds of generation) |
---|---|---|---|
1 | 7 | 10 | |
3 | 13 | 17 | |
7 | 30 | 44 | |
9 | 41 | 194 | |
18 | 71 | 308 |
with newer pytorch (2.4 nightly) we get bfloat16 support in MPS.
i tested this: