Open danicuki opened 4 years ago
that's the wrong command - it's loading this https://hub.docker.com/r/continuumio/miniconda/dockerfile
Try starting with this Dockerfile specific to jukebox https://github.com/btrude/jukebox-docker
this line https://github.com/johndpope/jukebox-docker/blob/master/Dockerfile#L198 should import soundfile
N.b - check nvidia-smi on your host for your cuda version - it should match with this import statement in dockerfile - you may need to bump cuda:10.2 - cuda:10.0 (sidenote - nvidia also have cudagl docker images / not applicable here) https://hub.docker.com/r/nvidia/cuda/ FROM nvidia/cuda:10.1-devel-ubuntu18.04
Thanks for the help. Now I've got this error:
Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx
root@af4016dfb45c:/opt/jukebox# exit
How do I run my docker host with NVIDIA on a Mac?
Thanks for the help. Now I've got this error:
Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx root@af4016dfb45c:/opt/jukebox# exit
How do I run my docker host with NVIDIA on a Mac?
You can't, it is currently only supported on linux
Thanks for the feedback!
Is there any way to make this project not locked in on NVIDIA dependencies?
On Tue, May 26, 2020 at 8:35 AM btrude notifications@github.com wrote:
Thanks for the help. Now I've got this error:
Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver fromhttp://www.nvidia.com/Download/index.aspx root@af4016dfb45c:/opt/jukebox# exit
How do I run my docker host with NVIDIA on a Mac?
You can't, it is currently only supported on linux
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/openai/jukebox/issues/91#issuecomment-633970322, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAS54GHDT5QIIRSMGANIITRTOSQHANCNFSM4NJYNR2A .
-- D
I'm a noob but I was able to get rid of that error by conda install -c conda-forge libsndfile . Although I think that's supposed to be covered in one of the install libraries somewhere so it could be a red flag you didn't install libraries properly. That's what happened to me.
I can get the program to run for like 2 minutes and then I get the error below. Anybody have any suggestions? Running on a vast.ai server.
py", line 581, in _load
deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly)
RuntimeError: unexpected EOF, expected 20664312 more bytes. The file might be corrupted.
terminate called after throwing an instance of 'c10::Error'
what(): owning_ptr == NullType::singleton() || owningptr->refcount.load() > 0 ASSERT FAILED at /opt/conda/conda-bld/pytorch_1556653114079/work/c10/util/intrusive_ptr.h:350, please report a bug to PyTorch. intrusive_ptr: Can only intrusive_ptr::reclaim() owning pointers that were created using intrusive_ptr::release(). (reclaim at /opt/conda/conda-bld/pytorch_1556653114079/work/c10/util/intrusive_ptr.h:350)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7fc5a1c8ddc5 in /opt/conda/envs/jukebox/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: THStorage_free + 0xca (0x7fc5a29d120a in /opt/conda/envs/jukebox/lib/python3.7/site-packages/torch/lib/libcaffe2.so)
frame #2:
I can get the program to run for like 2 minutes and then I get the error below. Anybody have any suggestions? Running on a vast.ai server.
py", line 581, in _load
deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly)
RuntimeError: unexpected EOF, expected 20664312 more bytes. The file might be corrupted.
terminate called after throwing an instance of 'c10::Error'
what(): owning_ptr == NullType::singleton() || owningptr->refcount.load() > 0 ASSERT FAILED at /opt/conda/conda-bld/pytorch_1556653114079/work/c10/util/intrusive_ptr.h:350, please report a bug to PyTorch. intrusive_ptr: Can only intrusive_ptr::reclaim() owning pointers that were created using intrusive_ptr::release(). (reclaim at /opt/conda/conda-bld/pytorch_1556653114079/work/c10/util/intrusive_ptr.h:350)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7fc5a1c8ddc5 in /opt/conda/envs/jukebox/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: THStorage_free + 0xca (0x7fc5a29d120a in /opt/conda/envs/jukebox/lib/python3.7/site-packages/torch/lib/libcaffe2.so)
frame #2: + 0x14872d (0x7fc5d0cb272d in /opt/conda/envs/jukebox/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #26: __libc_start_main + 0xf0 (0x7fc5df535830 in /lib/x86_64-linux-gnu/libc.so.6)
Aborted (core dumped)
Most likely one of the audio files you transferred to vast was corrupted, failed to transfer fully, or maybe you started training before it had fully transferred all the files from the directory. My vast servers have been egregiously slow this weekend so I was on support yesterday and they told me to just spin up a bunch of servers, determine which one doesn't have extremely slow network speeds and then destroy all the other ones (I have just built a pc specifically for ml at home so my vast days are behind me now thankfully and I'm definitely rethinking my recommendation given their slowness). I'll also note that if you are looking to train with your own music it is mostly pointless to do anything other than finetune the 1b model with your own genre/artist tag replacing existing one(s). Even with a local gpu with 24gb of vram I do not have enough memory to finetune or train from scratch at the depth of the 5b models, and training the small priors/vqvae results in significantly worse quality than just finetuning the 1b. I was able to get uncanny results finetuning with 1.5 hours of my own music on 1x tesla m40 for only 8 hours (but I am currently in the process of continuing that training so I would expect better results with even more training and properly annealing the training rate etc).
尝试在 docker 容器上运行
$docker run -i -t continuumio/miniconda /bin/bash
安装后,我收到此错误
(jukebox) root@182b585df72d:/jukebox# python jukebox/sample.py --model=5b_lyrics --name=sample_5b --levels=3 --sample_length_in_seconds=20 \ > --total_sample_length_in_seconds=180 --sr=44100 --n_samples=6 --hop_fraction=0.5,0.5,0.125 Traceback (most recent call last): File "jukebox/sample.py", line 7, in <module> from jukebox.utils.audio_utils import save_wav, load_audio File "/jukebox/jukebox/utils/audio_utils.py", line 4, in <module> import soundfile File "/opt/conda/envs/jukebox/lib/python3.7/site-packages/soundfile.py", line 142, in <module> raise OSError('sndfile library not found') OSError: sndfile library not found
How was it solved?
尝试在 docker 容器上运行
$docker run -i -t continuumio/miniconda /bin/bash
安装后,我收到此错误(jukebox) root@182b585df72d:/jukebox# python jukebox/sample.py --model=5b_lyrics --name=sample_5b --levels=3 --sample_length_in_seconds=20 \ > --total_sample_length_in_seconds=180 --sr=44100 --n_samples=6 --hop_fraction=0.5,0.5,0.125 Traceback (most recent call last): File "jukebox/sample.py", line 7, in <module> from jukebox.utils.audio_utils import save_wav, load_audio File "/jukebox/jukebox/utils/audio_utils.py", line 4, in <module> import soundfile File "/opt/conda/envs/jukebox/lib/python3.7/site-packages/soundfile.py", line 142, in <module> raise OSError('sndfile library not found') OSError: sndfile library not found
How was it solved?
apt-get update && apt-get install -y libsndfile1
Trying to run on a docker container
$docker run -i -t continuumio/miniconda /bin/bash
After installing, I get this error