Closed Arecsu closed 1 year ago
That's a great question. I could theoretically implement this feature, but I don't have a GPU that's recent enough to use with PyTorch (Demucs' ML backend), so I couldn't test it myself. If you'd like, I can try to do an experimental build with GPU support and you can let me know if it works for you or not! However, it might take me a few weeks since I'm currently quite busy with other projects and don't have as much time to focus on StemRoller.
Would you like me to make a GPU-compatible build sometime and send you a message when it's ready to test?
oh yeah sure! I'll be happy to test a GPU build when it's ready. Just message me and I'll be there.
I've been digging throught the code thinking it was just a matter of changing some arguments when demucs is being called but seems like it's more than just that. On 19 Aug 2022, 17:53 -0300, iffyloop @.***>, wrote:
That's a great question. I could theoretically implement this feature, but I don't have a GPU that's recent enough to use with PyTorch (Demucs' ML backend), so I couldn't test it myself. If you'd like, I can try to do an experimental build with GPU support and you can let me know if it works for you or not! However, it might take me a few weeks since I'm currently quite busy with other projects and don't have as much time to focus on StemRoller. Would you like me to make a GPU-compatible build sometime and send you a message when it's ready to test? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
Cool. Yeah it's a little more complex than just changing StemRoller's code - actually the demucs-cxfreeze
package needs to be refrozen with CUDA PyTorch instead of the CPU one. Shouldn't be too difficult but just takes some time. I'll send you a new build when I get a chance! Thanks for looking into it.
Hey thank you for taking the time to be specific about it! Now I know what needs to be done. I may be able to refroze it myself following your demucs-cxfeeze repo and bundling the proper CUDA libraries. Will take a look at it and get back soon On 19 Aug 2022, 18:40 -0300, iffyloop @.***>, wrote:
Cool. Yeah it's a little more complex than just changing StemRoller's code - actually the demucs-cxfreeze package needs to be refrozen with CUDA PyTorch instead of the CPU one. Shouldn't be too difficult but just takes some time. I'll send you a new build when I get a chance! Thanks for looking into it. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
UPDATE: success!!! Super fast. Really fast. My CPU takes like 4-7 minutes to process a song. I have a 8-core AMD 3700x @ 4.2ghz.
And with the GPU (2070 super) it takes like 20 seconds 🐣🎉
pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116 demucs SoundFile cx-Freeze
No changes to the app. Just replaced the demucs-cxfreeze
📁 folder. It's very heavy on file-size though, adding +3gb uncompressed.
If a SteamRoller release is going to be made compatible with nvidia GPUs, we should be aware of this somehow (from demucs repo):
If you want to use GPU acceleration, you will need at least 3GB of RAM on your GPU for demucs. However, about 7GB of RAM will be required if you use the default arguments. Add --segment SEGMENT to change size of each split. If you only have 3GB memory, set SEGMENT to 8 (though quality may be worse if this argument is too small). Creating an environment variable PYTORCH_NO_CUDA_MEMORY_CACHING=1 can help users with even smaller RAM such as 2GB (I separated a track that is 4 minutes but only 1.5GB is used), but this would make the separation slower.
If you do not have enough memory on your GPU, simply add -d cpu to the command line to use the CPU. With Demucs, processing time should be roughly equal to 1.5 times the duration of the track.
Thank you so much for testing this! I merged your PR into demucs-cxfreeze
and will try to upload a new build with CUDA support by the end of the weekend. Once that's done, I'll make a corresponding new version of StemRoller with the updated demucs-cxfreeze
bundle.
Unfortunately demucs-cxfreeze
with CUDA is 2.2GB compressed to ZIP, and GitHub Releases only allows files up to 2GB, so I'm not sure exactly how I'd be able to distribute this with StemRoller. I'll look into splitting the archive into chunks or finding another place to store it, but it may be quite a while before GPU support is available as this poses a slight logistical problem.
The HD hog is Torch. Maybe write a script for anyone wanting Nvidia GPU support (and it IS worth it, it's way faster) to pull the version of Torch with CUDA support themselves. The CLI install is
python.exe -m pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116
I know jack about Electron and how it deals with python dependencies but if you can offload Torch you'll save 2G compressed (4,502,012,008 bytes on disk size)
Maybe you can write a how-to on README.md to anyone interested on run StemRoller on Nvidia (like me).
Thanks @aleph23 and @rdeavila for the suggestions. If you think users would be comfortable with manually installing it, I can definitely add a section to the README about how to configure it. I'll see if I can release just the GPU-specific components as sort of a "patch" that could just be unzipped into the main app directory.
If any of you have a PyTorch CUDA compatible GPU and want to test out the latest update to the develop
branch then please let me know if it works for you (and if not, what errors you see as output) and how long it takes to split with GPU enabled. I don't have a recent enough GPU on my device to test it, so it'd be helpful if I could get confirmation from someone else. Maybe @Arecsu since you seemed to have success doing it yourself earlier?
I can test it, sure. So far, I've cloned the develop
branch and build it. Small bug here in ResultCard.svelte:
{#if status === 'processing'}
<Button Icon={LoadingSpinnerIcon} text="Processing" disabled={true} />
{#if status === 'downloading'}
<Button Icon={LoadingSpinnerIcon} text="Downloading" disabled={true} />
{:else if status === 'queued'}
<Button Icon={CollectionIcon} text="Queued" disabled={true} />
Pull request → https://github.com/stemrollerapp/stemroller/pull/31
There are two #if
. Second one should be :else if
:p
I see in your latest commits a Github Actions script that should build a CUDA version. At least for me locally, I don't get any automatic PyTorch download during the build process. How can I manage to build one? Or is it a build available to download and test?
Edit: figured out I have to run npm run download-third-party-apps
and then build it :)
Everything works great so far! Is it using the newest version of demucs? The whole package compressed using 7z level 9 LZMA2 is ~1.6gb. I think it can be uploaded to Github releases
Glad to hear it worked! Thanks so much for testing. Yes, this is using the latest Demucs with the htdemucs_ft
model which should be the best. You can check the console output (when running in dev mode) to make sure it is using the demucs-cxfreeze
from this repo instead of Demucs installed to your system (but that should be forced by the way the env vars are set for the child process). And the goal of compressing to 7z SFX was that now it should be uploaded to GitHub releases, so the plan is for this to be the next release!
Yes. It is using htdemucs_ft indeed. Looking awesome already. Thank you so much!
New version is out now! Download from https://github.com/stemrollerapp/stemroller/releases
Hi! Great project :)
As far as I know, demucs supports GPU acceleration. I've noticed StemRoller only uses my CPU. I have an nvidia RTX 2070. Would it be possible to have at least a switch within the app to enable GPU acceleration?
Thank you!