Trying to run Jukebox on Google CoLab on a local machine

Maichelanger commented 4 years ago

Before saying anything, I must say that I'm not a programmer and I only have little idea about Phyton yet (almost no idea), but I want to experiment with programs like this one. The problem is, due to my lack of knowledge, my problem-solving skills are very limited. And although I have found some solutions on my own, others scapes from my abilities, like this one:

With the "notebook" already being run in my local machine, I download all the things I need for Jukebox. Then, the problem appears when I try to start the model in this cell:

model = "1b_lyrics" # or "1b_lyrics"     
hps = Hyperparams()
hps.sr = 44100
hps.n_samples = 3 if model=='5b_lyrics' else 3
hps.name = 'samples'
chunk_size = 16 if model=="5b_lyrics" else 32
max_batch_size = 3 if model=="5b_lyrics" else 3
hps.levels = 3
hps.hop_fraction = [.5,.5,.125]

vqvae, *priors = MODELS[model]
vqvae = make_vqvae(setup_hparams(vqvae, dict(sample_length = 1048576)), device)
top_prior = make_prior(setup_hparams(priors[-1], dict()), vqvae, device)

And the response in the console seems to say that the system was unable to find some files:

Downloading from gce
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-10-cb967a6fa3e2> in <module>
     10 
     11 vqvae, *priors = MODELS[model]
---> 12 vqvae = make_vqvae(setup_hparams(vqvae, dict(sample_length = 1048576)), device)
     13 top_prior = make_prior(setup_hparams(priors[-1], dict()), vqvae, device)
     14 

~\anaconda3\lib\site-packages\jukebox\make_models.py in make_vqvae(hps, device)
     93 
     94     vqvae = vqvae.to(device)
---> 95     restore_model(hps, vqvae, hps.restore_vqvae)
     96     if hps.train and not hps.prior:
     97         print_all(f"Loading vqvae in train mode")

~\anaconda3\lib\site-packages\jukebox\make_models.py in restore_model(hps, model, checkpoint_path)
     53     model.step = 0
     54     if checkpoint_path != '':
---> 55         checkpoint = load_checkpoint(checkpoint_path)
     56         # checkpoint_hps = Hyperparams(**checkpoint['hps'])
     57         # for k in set(checkpoint_hps.keys()).union(set(hps.keys())):

~\anaconda3\lib\site-packages\jukebox\make_models.py in load_checkpoint(path)
     32                 os.makedirs(os.path.dirname(local_path))
     33             if not os.path.exists(local_path):
---> 34                 download(gs_path, local_path)
     35         restore = local_path
     36     dist.barrier()

~\anaconda3\lib\site-packages\jukebox\utils\gcs_utils.py in download(gs_path, local_path, async_download)
     34         subprocess.Popen(args)
     35     else:
---> 36         subprocess.call(args)
     37 
     38 def ls(regex):

~\anaconda3\lib\subprocess.py in call(timeout, *popenargs, **kwargs)
    337     retcode = call(["ls", "-l"])
    338     """
--> 339     with Popen(*popenargs, **kwargs) as p:
    340         try:
    341             return p.wait(timeout=timeout)

~\anaconda3\lib\subprocess.py in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, encoding, errors, text)
    798                                 c2pread, c2pwrite,
    799                                 errread, errwrite,
--> 800                                 restore_signals, start_new_session)
    801         except:
    802             # Cleanup if the child failed starting.

~\anaconda3\lib\subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, unused_restore_signals, unused_start_new_session)
   1205                                          env,
   1206                                          os.fspath(cwd) if cwd is not None else None,
-> 1207                                          startupinfo)
   1208             finally:
   1209                 # Child is launched. Close the parent's copy of those pipe

FileNotFoundError: [WinError 2] El sistema no puede encontrar el archivo especificado

I can assume that the problem is not that much complicated, but again, I have little idea of Phyton.

I thank beforehand any of the help that you could bring me.

robinsloan commented 4 years ago

I'm not one of the project's programmers/maintainers, but I'll just chime in & say this is the spot where gcs_utils.py is trying to call wget to download the models; do you have wget on your computer? If not, you should try installing it (from here, maybe?) and then give this another shot.

Maichelanger commented 4 years ago

I'm not one of the project's programmers/maintainers, but I'll just chime in & say this is the spot where gcs_utils.py is trying to call wget to download the models; do you have wget on your computer? If not, you should try installing it (from here, maybe?) and then give this another shot.

Yes, that was the problem. Now, with wget properly installed, everything works fine.

Except for this:

Running this cell:

zs = upsample(zs, labels, sampling_kwargs, [*upsamplers, top_prior], hps)

The console comes up with an error:

Sampling level 1
Sampling 8192 tokens for [0,8192]. Conditioning on 0 tokens
Ancestral sampling 3 samples with temp=0.99, top_k=0, top_p=0.0
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-11-8b3a8268585a> in <module>
----> 1 zs = upsample(zs, labels, sampling_kwargs, [*upsamplers, top_prior], hps)

~\anaconda3\lib\site-packages\jukebox\sample.py in upsample(zs, labels, sampling_kwargs, priors, hps)
    137 def upsample(zs, labels, sampling_kwargs, priors, hps):
    138     sample_levels = list(range(len(priors) - 1))
--> 139     zs = _sample(zs, labels, sampling_kwargs, priors, sample_levels, hps)
    140     return zs
    141 

~\anaconda3\lib\site-packages\jukebox\sample.py in _sample(zs, labels, sampling_kwargs, priors, sample_levels, hps)
    100         total_length = hps.sample_length//prior.raw_to_tokens
    101         hop_length = int(hps.hop_fraction[level]*prior.n_ctx)
--> 102         zs = sample_level(zs, labels[level], sampling_kwargs[level], level, prior, total_length, hop_length, hps)
    103 
    104         prior.cpu()

~\anaconda3\lib\site-packages\jukebox\sample.py in sample_level(zs, labels, sampling_kwargs, level, prior, total_length, hop_length, hps)
     83     if total_length >= prior.n_ctx:
     84         for start in get_starts(total_length, prior.n_ctx, hop_length):
---> 85             zs = sample_single_window(zs, labels, sampling_kwargs, level, prior, start, hps)
     86     else:
     87         zs = sample_partial_window(zs, labels, sampling_kwargs, level, prior, total_length, hps)

~\anaconda3\lib\site-packages\jukebox\sample.py in sample_single_window(zs, labels, sampling_kwargs, level, prior, start, hps)
     67     z_samples = []
     68     for z_i, z_conds_i, y_i in zip(z_list, z_conds_list, y_list):
---> 69         z_samples_i = prior.sample(n_samples=z_i.shape[0], z=z_i, z_conds=z_conds_i, y=y_i, **sampling_kwargs)
     70         z_samples.append(z_samples_i)
     71     z = t.cat(z_samples, dim=0)

~\anaconda3\lib\site-packages\jukebox\prior\prior.py in sample(self, n_samples, z, z_conds, y, fp16, temp, top_k, top_p, chunk_size, sample_tokens)
    259         with t.no_grad():
    260             # Currently x_cond only uses immediately above layer
--> 261             x_cond, y_cond, prime = self.get_cond(z_conds, y)
    262             if self.single_enc_dec:
    263                 # assert chunk_size % self.prime_loss_dims == 0. TODO: Check if needed

~\anaconda3\lib\site-packages\jukebox\prior\prior.py in get_cond(self, z_conds, y)
    240             y, prime = None, None
    241         y_cond, y_pos = self.y_emb(y) if self.y_cond else (None, None)
--> 242         x_cond = self.x_emb(z_conds) if self.x_cond else y_pos
    243         return x_cond, y_cond, prime
    244 

~\anaconda3\lib\site-packages\jukebox\prior\prior.py in x_emb(self, z_conds)
    208         x_cond = None
    209         for z_cond, conditioner_block in reversed(list(zip(z_conds, self.conditioner_blocks))):
--> 210             x_cond = conditioner_block(z_cond, x_cond)
    211         return x_cond
    212 

~\anaconda3\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

~\anaconda3\lib\site-packages\jukebox\prior\conditioners.py in forward(self, x, x_cond)
     43         # Run conditioner
     44         x = self.preprocess(x)
---> 45         x = self.cond(x)
     46         x = self.postprocess(x)
     47         x = self.ln(x)

~\anaconda3\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

~\anaconda3\lib\site-packages\jukebox\vqvae\encdec.py in forward(self, x)
     44 
     45     def forward(self, x):
---> 46         return self.model(x)
     47 
     48 class Encoder(nn.Module):

~\anaconda3\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

~\anaconda3\lib\site-packages\torch\nn\modules\container.py in forward(self, input)
     98     def forward(self, input):
     99         for module in self:
--> 100             input = module(input)
    101         return input
    102 

~\anaconda3\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

~\anaconda3\lib\site-packages\torch\nn\modules\conv.py in forward(self, input)
    210                             _single(0), self.dilation, self.groups)
    211         return F.conv1d(input, self.weight, self.bias, self.stride,
--> 212                         self.padding, self.dilation, self.groups)
    213 
    214 

RuntimeError: cuDNN error: CUDNN_STATUS_ALLOC_FAILED

From what I could find, it seems that my GPU doesn't have enough VRAM. Am I right? Then... Wouldn't there be another way to run the upsampling on my computer without relying on Google CoLab's machines?

robinsloan commented 4 years ago

To check this hypothesis, you could try changing the model from 5b_lyrics to 1b_lyrics, which requires less GPU memory, and see if you're able to upsample.

Maichelanger commented 4 years ago

Unfortunately... I already changed the model to 1b and lowered the max batch size to 3 before sampling. It's obvious that my card is not enough for AI (An RTX 2060 with 6GB of VRAM), but I thought that maybe there would be some workaround...

Any other idea?

BTW, thanks for your answers.

robinsloan commented 4 years ago

The docs say the 1b_lyrics model takes 3.8 GB, with another few hundred megabytes on top for other parts of the model, so maybe it would be worth trying a batch size of just 1? If that doesn't work, then yeah, it might simply not be possible with a 6GB GPU.

No problem---happy to help, even it's only slightly.

Maichelanger commented 4 years ago

Nope... Same Issue. I think the other things that load into the VRAM takes much more space than a few hundred MB.

Well, It was nice to try at least.

openai / jukebox

Trying to run Jukebox on Google CoLab on a local machine #128