Open tig3rmast3r opened 7 months ago
Hi @tig3rmast3r,
This is insane!! I wanted to say this looks and sounds super cool! Would love to hear a full AI techno set done with this tool if you'd have one to share!
I'm really happy to see vampnet in a full creative interface like this one.
Training from scratch requires a large dataset (50k hours of audio, more or less), and enough GPUs to fit a batch size of ~32 for the duration of audio context you'd like to train for. You can have a look a the settings used to train the model in conf/vampnet.yml
.
Hi Hugo, glad you liked it, i played with that a lot but never found the time to do a good recording, i'm planning to make a youtube video sooner or later. About the training, i've already carefully selected 10k+ chunks that should be like 28 hours, is a very personal model as it includes most of my discography for the first 30%, i've used it already for this project as fine-tuning. I did around 300 ephocs for fine-tuning (batch_size*iterations/n. of chunks), with an rtx 4090 it took around 90 hours total (70 coarse+ 20 c2f) i stopped at 300 because the learning rate was dropping very quickly. i've already had a look to the vampnet.conf, what is the parameter that define the model size ? is just VampNet.embedding_dim: 1280 ? i mean if i want to make a double size pth just doubling this value is enough or do i have to adjust something else ? i guess that as the model is aimed for just techno/tec-house 28 hours may be enough... with rtx 4090 i can't go over batchsize 5 because with 6 once it saves the first checkpoint it will go over 24gb ram
yeah, doubling that value could work. You could also try changing the number of layers and heads, though that might require a bit more finetuning to get it working.
is it normal that training with identical parameters and dataset on linux gives different results than windows? i'm telling this because i trained a model for a few days and i'm still getting very bad results, i run for 334600 iters with batch 3, that is 100 epochs as i have 10038 chunks. i used linux with torch.compile with pytorch 2.1.2 and 118, i did the same with c2f. i used embedding 1914, head 22 and layers 22, while for c2f i lowered to 1800,20,20. it's still far from being good so i'm wondering if there's someting wrong in linux, like for example is wrongly reading the wav files. So for testing purpouses i did a quick training with a few chunks and i did the same in windows so the models should be identical and i've discovered that linux ones are usually missing higher frequencies like they are pitched down, i've attached 2 wavs generated with same seed without mask. I will do more tests in windows with longer train but it looks there's something wrong on my linux setup. How can i make sure it's reading files correctly? they are all wavs mono 16bit pcm 44100hz. I'm sure Windows is ok cause i did many fine-tunings and they sounds great. note that i don't use torch.compile in windows as is not available and i'm on pytorch 2.1.0 but i tried even pytorch 2.3.0 in linux with cuda 12.1 and the models appears almost identical to 2.1.2 with 11.8. do you have any clue ?
here's the test wavs.zip comparison between the linux and windows trainings.
thanks
i def have a problem in linux did a longer training tonight in windows with same values as linux. this is only 10 epochs for coarse + 20 for c2f, versus 100 + 100 from linux. now i got what i was expecting, would like to sort out my linux issue so i can rent a runpod for the training, any idea would be really appreciated. thanks i attached 2 examples, one pair without mask, same seed, and another pair with mask with another seed. testwav.zip
i did a quick test and looks that the problem is with torch.compile command. removing torch.compile from train.py solved the problem in linux. Do you have a specific combination of pytorch and cuda that you have tested with torch.compile and you know it's working ? i tested so far: 2.1.2 with cu11.8 = bad training 2.3.0 with cu12.1 (dev build) = bad training 2.2.2 with cu12.1 = error (missing 1 required positional argument: 'dim') hope this helps
EDIT: i did more test and unfortunately the problem is not with torch.compile, while i've noticed that with torch.compile i get different results both results are bad. i've also rented a runpod istance and run for 1 entire day (6 x RTX4000 ADA, python3.9 without torch.compile and pytorch 2.2.0 cu12.1) and i got same results so it's not related to my config, it looks a general issue with linux, i can print my installed conda and pip if you need more info. thanks
i've finally found a working combination, honestly i haven't found the root issue but i can use linux now! i tried both home and i'm actually training on vast.ai with 4x 4090, no issues so far (not using torch.compile but at least multi gpu is working too) I got a pip list for pc and tried to make it as much similar as possible in linux here are all the combinations that works, i have applied the fix on all configs to avoid bad audio results (more info below)
_ | python 3.11.4, pytorch 2.0.1, cu11.8 | python 3.11.4, pytorch 2.1.2, cu11,8 | python 3.9.x or 3.11.4, pythorch 2.2.x, cu11.8 or 12.1 | python 3.11.4, pytorch 2.3.0cu12.1 | python 3.9.17, pytorch 2.3.0cu12.1 | python 3.10.14, pytorch 2.3.0cu12.1 | python 3.10.14, pytorch 2.0.1cu11.8 -- | -- | -- | -- | -- | -- | -- | -- Single GPU | working | working | untested | untested | working | untested | working Single GPU + torch.compile | working | working | error | error | working | untested | working Multi GPU | working | working | untested | untested | working | untested | working Multi GPU + torch.compile | incompatible | error | error | error | stuck@"starting training loop" | working (sometimes) | workingi've trained several combinations to understand the impact of torch.compile and python versions on speed and quality Quality is similar accross all tests as expected (needed to make 50 ephocs to minimize randomness) Speed report:
_ | Windows python 3.11.4, pytorch 2.1.1, cu11.8 no torch.compile | Linux python 3.11.4, pytorch 2.1.2, cu11,8 no torch.compile | Linux python 3.11.4, pytorch 2.1.2, cu11,8 | Linux python 3.9.17, pytorch 2.3.0cu12.1 no torch.compile | Linux python 3.9.17, pytorch 2.3.0cu12.1 -- | -- | -- | -- | -- | -- Speed | base | 6.6% faster | 16% faster | 6.1% faster | 16.5% fasteri've finally found a working torch.compile config for multi-gpu, using python 3.10 and latest pytorch!! tested on 4 x rtx4090 EDIT: i get errors during startup sometimes with 2.3.0cu12.1, no problems with 2.0.1cu11.8
About the trick to fix bad audio i have attached a zip containing the following file -file "new" the working combination from pc (edited) -3 examples, before - requirements - after for a 3.9.17 setup with 2.3.0cu12.1 -a compare.py to create the requirements file based on the windows working file (more info below)
Basically one of the modules inside the requirements file is causing bad audio. i still haven't identified which one.
EDIT: i have updated my fork with installation instructions to get this working on Windows and linux (single and multi gpu) succesfully. i will no longer update this thread and i've removed installation instructions.
i've used the compare trick on clean conda envs both locally and on vast.ai containers.
hopefully you will find the root cause so we can define the version during pip install -e ./vampnet (or update the code to work with the latest version of whatever it is)
Here's the zip files.zip
Lastly, while i was testing i've found the time to report some "TimeToTrain" values, it may help finding the perfect server to train and save some (or a lot) of $$ Here are all the tested server, i used mostly runpod.io i've switched now to vast.ai as is much cheaper in most cases TimeToTrain is based on n. of Ephocs, so larger batch size have fewer iteractions. basically (batch_size x iteractions) is equal for all tests
model | ram | cuda | tensor | freq | tdp | gpu | batch | time to train | fp32bench (single GPU)| vastai tflops | vastai dlp min | vastai dlp max | NOTE -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- RTX 4090 | 24 | 16384 | 512 | 2235 | 450 | 4 | 16 | 178 | 82,58 | 327,00 | 260,00 | 350,00 | cheap server, may be better H100 80 smx5 | 80 | 16896 | 528 | 1590 | 700 | 1 | 16 | 224 | 66,91 | 107,00 | 580,00 | 684,00 | RTX 4000 ADA | 20 | 6144 | 192 | 1500 | 130 | 6 | 18 | 241 | 26,73 | | | | RTX 4090 | 24 | 16384 | 512 | 2235 | 450 | 2 | 6 | 264 | 82,58 | 162,00 | 175,00 | 212,00 | L40 | 48 | 18176 | 568 | 735 | 300 | 2 | 12 | 290 | 90,52 | 144,00 | 231,00 | 231,00 | RTX A4000 | 16 | 6144 | 192 | 735 | 140 | 8 | 16 | 295 | 19,17 | | | | RTX 4090 | 24 | 16384 | 512 | 2235 | 450 | 1| 4 | 379 | 82,58 | | | | My home pc (11700k@5Ghz) RTX 6000 ADA | 48 | 18176 | 568 | 915 | 300 | 1 | 6 | 450 | 91,06 | 81,00 | 135,00 | 135,00 | Multi-gpu not working RTX 4000 ADA sff | 20 | 6144 | 192 | 720 | 70 | 4 | 8 | 513 | 19,17 | 41,00 | 31,00 | 42,00 | RTX A5000 (SFF?) | 24 | 6144 | 192 | 900 | 150 | 6 | 18 | 545 | 19,35 | | | | strange model with 150w tdp and lower perfs RTX A5000 | 24 | 8192 | 256 | 1170 | 230 | 2 | 6 | 572 | 27,77 | 55,00 | 55,00 | 69,00 | A100 80 PCIe | 80 | 6192 | 432 | 1065 | 300 | 1 | 16 | 600 | 19,49 | 31,00 | 170,00 | 260,00 | strangely low, probably cpu bound RTX 3090 | 24 | 10496 | 328 | 1395 | 350 | 2 | 6 | 930 | 35,58 | 71,00 | 75,00 | 90,00 | low performance node, need retest Tesla V100 PCIe | 16 | 5120 | 640 | 937 | 250 | 6 | 12 | 1420 | 16,32 | 25,00 | 34,00 | 40,00 | uses only 20% TDP ??EDIT May 9: updated infos and zipped file, will update more as soon as i have more info. EDIT May20: more info, removed installation instructions (there's a quickinstall.sh bash on my fork) Hope this helps
Hi Hugo, this project rocks i'm having lot of fun, this is the best AI based audio tool available right now, at least for what i'm looking for.
Would like to share where i'm going from this awesome project.. i've made an app in c# that takes care of keeping loops overtime and send generation presets to gradio like you did with unloop, losing hours just listening to generations and i'm making tons of brand new audio loops too!! working also on a vst plug-in that send generated wavs into daw combined with demucs for separation, made also a very primitive liveset using 3 c# apps simultaneously in realtime and sending demucsed streams to vst -> reaktor and mixing them, what a blast!! Not sure if there's interest on what i'm doing, i may share the projects but i'm nothing special with coding, i just know how to use chatgpt properly :)
this is the c# app
this is the liveset setup using vst + bidule + reaktor (+ ipad and midi controller)
https://github.com/hugofloresgarcia/vampnet/assets/23511425/bbcf54ae-136b-4d60-aafa-3055eff58c2f
now for the question, is there a way to start a new training and make a bigger size model ? is it an easy task ? i mean i have no idea what i have to change in the code to set model size for training like 2x or 3x bigger, i only did (huge) fine-tunings till now. would like to test starting training from scratch and making a bigger model too to see what happens :)
thanks a lot!