Closed MarkTension closed 4 years ago
Hi, it does seem that the reason it would take so long is running on CPU. You might want to confirm that your tensorflow is able to run on the GPU on your machine outside of the script. The long files however is very strange. @JCBrouwer do you have any thoughts on what could be going on?
Thanks @jesseengel . It did seem that GPU was available from tf.test.is_gpu_available, but I'll give it a more thorough check by running some model then.
Maybe relevant: There were some errors with generate.py code in general. np.linspace kept throwing errors about the third np.linspace argument having to be an integer (it was a float). (maybe a numpy / python version issue? I use numpy 1.18.4 and python 3.6). Temporarily silenced it by rounding to int with np.int(), but might indicate that the cause of long length files is a similar issue? I tried on ubuntu and windows and both needed that modification
I was able to reproduce the issue as well, ~but also managed to get it working correctly~.
The issue for me was indeed related to executing on the CPU. To get it working I uninstalled magenta and made sure only magenta-gpu was installed. Also, my tf.test.is_gpu_available() showed an error because cudnn wasn't installed. This is super easy to install if you use anaconda, just a "conda install cudnn" was enough.
~The 30 minute wavs are related to the way the files are slowly being filled in 10000 samples at a time by nsynth_generate. I think it's just the metadata of the half-finished audio being incorrect. Once they're completely done rendering it should show them as the correct length.~
A good way to test whether you're running on the GPU correctly is to run the nsynth_generate command manually (generate.py is running this under the hood for you):
nsynth_generate \
--checkpoint_path=/home/hans//code/magenta/magenta/models/nsynth/wavenet/wavenet-ckpt/model.ckpt-200000 \
--source_path=/home/hans/code/magenta-demos/nsynth/working_dir/embeddings/interp/batch0 \
--save_path=/home/hans/code/magenta-demos/nsynth/working_dir/audio/batch0 \
--sample_length=80000 \
--batch_size=256 \
--log=DEBUG \
--gpu_number=0
(you'll have to edit the paths to match your own) On my 1080 Ti I'm generating around 100 samples per second while on the CPU it was taking a LOT longer. While it's running you can take a look in htop & nvtop to see what's being used.
BTW, I also ran into the float error in the np.linspace and just casting the third argument is fine. I think as long as the second one isn't cast it shouldn't break anything (I believe it needed to be a float to prevent some off by one errors related to audio file placement in multigrids).
OK scratch the part about the long files. The run I left on overnight is still generating. For some reason it isn't stopping after it has generated sample_length samples. The first 5 seconds (ie 80000 samples) of audio sound correct, but the files are still 43 minutes long.
I can take a better look through generate.py over the next couple days, but I feel like this might be something in nsynth_generate.
Do embeddings of length 156 sound about right for 5 sec of audio @jesseengel ?
@JCBrouwer Thank you, I'll make sure to fix my GPU problem first, and for the time being I'll manually end the generation when I hit 4 seconds sample length
Thanks for following up on this @JCBrouwer. If I recall correctly embeddings should be every 32ms, so 156 sounds right I think.
Hi all!
I'm trying to run generate.py
the soxi output of my wav files looks pretty okay:
But for some reason the training won't work. I get as far as generating embeddings files, but when interpolating I've been waiting for over 12 hours to generate one batch and cancelled it. I'm running on a NVIDIA geforce GTX 1080.
When checking the working directory I see a lot of wav files generated (31Gb) in the batch0 folder, but they have a length each of ~ 34 minutes. This must be wrong right?
I think the main problem is that I'm getting a lot of tensorflow warnings about how most resources are placed on the CPU. GPU's are not active during training when checking nvidia-smi. I've tried on two computers and both give the same error.
Any ideas of what's going wrong? Or ideas on how to debug? Thanks in advance!