Closed eschmidbauer closed 1 month ago
1) You can try removing the assert assert seq_len <= self.max_seq_len and see if the audio will fit into GPU memory.
2) I am planning to add long-form processing, which would be optimized for audio clips such yours via chunking. Give me a few days if the #1 doesn't work
separate issue was triggered- looks related to CUDA memory. Looking forward to the long form implementation
@eschmidbauer I've added longform inference with window chunking, tested on 11m audio and it works well
https://github.com/skirdey/voicerestore/blob/main/inference_long.py
thank you ! i have noticed it takes a very long time to run long inference. the VRAM usage is very low (6% in testing), im wondering if the model/data is not fully loaded into CUDA
Thanks for sharing the experiment. I think you can try changing the window size (make it larger) and reduce overlap size to get a better utilization of gpu and have less chunks to process. Both are available as CLI parameters in the infer_long script.
A few notes on future updates: 1) The model is still training, with more training it will require less CFM steps thus will dramatically speed up performance 2) The current model is raw pytorch export, I am planning to release a quantized and pruned version for inference once it trains a bit more
@eschmidbauer I've updated inference_long script now it is 10x faster
i am testing the HF code since it has a model and It is still taking a very long time for a 3m20s file -
>>> model("poor-quality-mono.wav", "test_output.wav", short=False)
it doesn't appear like the model is using much resources on GPU either.
Hi Emmanuel! Thank you for reporting. I'll take a look into it and see what I can do for the HF code optimization.
On Wed, Oct 9, 2024 at 8:37 AM Emmanuel Schmidbauer < @.***> wrote:
i am testing the HF code https://huggingface.co/jadechoghari/VoiceRestore since it has a model and It is still taking a very long time for a 3m20s file -
model("poor-quality-mono.wav", "test_output.wav", short=False) it doesn't appear like the model is using much resources on GPU either. image.png (view on web) https://github.com/user-attachments/assets/8912dc56-6d42-4a2b-b873-451408d9dffa
Screenshot.2024-10-09.at.11.33.40.AM.png (view on web) https://github.com/user-attachments/assets/d987502c-306e-4aa2-babd-622fbe87ae74
— Reply to this email directly, view it on GitHub https://github.com/skirdey/voicerestore/issues/1#issuecomment-2402676972, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHIJQGCG4BSPXQNJAYJFYDZ2VETBAVCNFSM6AAAAABOWNEBH6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMBSGY3TMOJXGI . You are receiving this because you modified the open/close state.Message ID: @.***>
thank you for sharing this project; im trying to run the
audio_restoration_model.py
on a 2m57s call (16k, 1 channel) and i get the following error: