ValueError: invalid literal for int() with base 10: '\\'

everydaydigital commented 3 years ago

Hi there, I'm trying to get OpenAi Jukebox running on Windows 10 with an NVIDIA RTX 3070 8GB GPU (after finally getting it installed/compiled by placing wget into my system32 folder using the nightly build of torch along with using git bash to install TensorboardX).

It has yet to run successfully though, and now there's a weird error when running the script:

Traceback (most recent call last):
  File "jukebox/sample.py", line 279, in <module>
    fire.Fire(run)
  File "D:\OpenAIJukebox\envs\lib\site-packages\fire\core.py", line 127, in Fire
    component_trace = _Fire(component, args, context, name)
  File "D:\OpenAIJukebox\envs\lib\site-packages\fire\core.py", line 366, in _Fire
    component, remaining_args)
  File "D:\OpenAIJukebox\envs\lib\site-packages\fire\core.py", line 542, in _CallCallable
    result = fn(*varargs, **kwargs)
  File "jukebox/sample.py", line 271, in run
    rank, local_rank, device = setup_dist_from_mpi(port=port)
  File "d:\openaijukebox\jukebox\jukebox\utils\dist_utils.py", line 46, in setup_dist_from_mpi
    return _setup_dist_from_mpi(master_addr, backend, port, n_attempts, verbose)
  File "d:\openaijukebox\jukebox\jukebox\utils\dist_utils.py", line 86, in _setup_dist_from_mpi
    dist.init_process_group(backend=backend, init_method=f"env://")
  File "d:\openaijukebox\jukebox\jukebox\utils\dist_adapter.py", line 61, in init_process_group
    return _init_process_group(backend, init_method)
  File "d:\openaijukebox\jukebox\jukebox\utils\dist_adapter.py", line 86, in _init_process_group
    return dist.init_process_group(backend, init_method)
  File "D:\OpenAIJukebox\envs\lib\site-packages\torch\distributed\distributed_c10d.py", line 500, in init_process_group
    store, rank, world_size = next(rendezvous_iterator)
  File "D:\OpenAIJukebox\envs\lib\site-packages\torch\distributed\rendezvous.py", line 186, in _env_rendezvous_handler
    master_port = int(master_port)
**ValueError: invalid literal for int() with base 10: '\\'**

My clean install process is:

Open miniconda terminal

conda config --add pkgs_dirs D:.pkgs D: cd D:\OpenAIJukebox conda create --prefix ./envs python=3.7.5 -y conda activate ./envs conda install mpi4py=3.0.3 -y conda update -n base -c defaults conda -y pip3 install numpy pip3 install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cu110/torch_nightly.html git clone https://github.com/openai/jukebox.git cd jukebox pip3 install -r requirements.txt pip3 install -e . conda install av=7.0.01 -c conda-forge -y

Close miniconda

open new git bash terminal from D:\OpenAIJukebox

conda activate ./envs cd ./jukebox pip install ./tensorboardX

close git bash terminal

open Miniconda terminal

D: cd D:\OpenAIJukebox\jukebox conda activate ./envs

python jukebox/sample.py --model=1b_lyrics --name=sample_5b_prompted --levels=3 --mode=primed \ --audio_file=D:\OpenAIJukebox\jukebox\prompts\home.wav,D:\OpenAIJukebox\jukebox\prompts\Parallax.wav,D:\OpenAIJukebox\jukebox\prompts\Skankin.wav,D:\OpenAIJukebox\jukebox\prompts\Vitals.wav --prompt_length_in_seconds=6 \ --sample_length_in_seconds=10 --total_sample_length_in_seconds=60 --sr=44100 --n_samples=1 --hop_fraction=0.5,0.5,0.125

RTX 30XX cards use the Ampere architecture, which requires CUDA 11.x - so I've had to modify some of the original install instructions to build with the latest version of torch and the cudatoolkit & I'm wondering if that might be part of the issue? I have read other reports that people have this working on 3090's though, so not really sure where I might have gone wrong.

Any advice appreciated - I'd really love to be able to get this running on my system or at least understand where the issue is coming from. Cheers,

CarlChaaya commented 3 years ago

Hello there, did you manage to get it running? If yes how? I am getting the same issue.

everydaydigital commented 3 years ago

Hello there,

did you manage to get it running? If yes how? I am getting the same issue.

Hey, I haven't been able to progress past this error yet - but will make sure to update here if I do.

Good to hear that's it's not just me with this issue though, so there's still a chance that someone with a bit more experience will be able to help us out.

Randy-H0 commented 3 years ago

Please use colab. An rtx 3070, 3080 and 3090 aren't designed to run this, especially on windows this is an issue.

CarlChaaya commented 3 years ago

Please use colab. An rtx 3070, 3080 and 3090 aren't designed to run this, especially on windows this is an issue.

Hey Randy, I have been using Colab to train VQ-VAE but I'm still getting the same error. Any idea what the issue might be?

everydaydigital commented 3 years ago

Please use colab. An rtx 3070, 3080 and 3090 aren't designed to run this, especially on windows this is an issue.

Thanks for the reply - Are you able to confirm in more detail about how this specific issue is being caused by an RTX 30xx series GPU?

This error appears to claim that the master port is being sent a null value. Do you think that maybe this could be coming from a change within the GPU architecture?

I've managed to get this running on another Windows PC with an RTX 2070 with 6GB RAM and while it doesn't make it all the to the final upsampled stage (it displays a clear error that it's run out of RAM), it still manages to run through a few files and generate some new audio output along the way.

I can accept that more RAM is required for a successful run, but for my particular usage and interest it would be great to lock down the source of this issue.

If every github issue can be solved with 'just use colab' the open source nature of these project would never progress very far.

By sharing our bug reports with the community, there is the opportunity for a greater all-round understanding for everyone and ideally the issue can be resolved together!

Cheers

cicinwad commented 2 years ago

Hi there, I'm trying to get OpenAi Jukebox running on Windows 10 with an NVIDIA RTX 3070 8GB GPU (after finally getting it installed/compiled by placing wget into my system32 folder using the nightly build of torch along with using git bash to install TensorboardX).

It has yet to run successfully though, and now there's a weird error when running the script:
Traceback (most recent call last):
  File "jukebox/sample.py", line 279, in <module>
    fire.Fire(run)
  File "D:\OpenAIJukebox\envs\lib\site-packages\fire\core.py", line 127, in Fire
    component_trace = _Fire(component, args, context, name)
  File "D:\OpenAIJukebox\envs\lib\site-packages\fire\core.py", line 366, in _Fire
    component, remaining_args)
  File "D:\OpenAIJukebox\envs\lib\site-packages\fire\core.py", line 542, in _CallCallable
    result = fn(*varargs, **kwargs)
  File "jukebox/sample.py", line 271, in run
    rank, local_rank, device = setup_dist_from_mpi(port=port)
  File "d:\openaijukebox\jukebox\jukebox\utils\dist_utils.py", line 46, in setup_dist_from_mpi
    return _setup_dist_from_mpi(master_addr, backend, port, n_attempts, verbose)
  File "d:\openaijukebox\jukebox\jukebox\utils\dist_utils.py", line 86, in _setup_dist_from_mpi
    dist.init_process_group(backend=backend, init_method=f"env://")
  File "d:\openaijukebox\jukebox\jukebox\utils\dist_adapter.py", line 61, in init_process_group
    return _init_process_group(backend, init_method)
  File "d:\openaijukebox\jukebox\jukebox\utils\dist_adapter.py", line 86, in _init_process_group
    return dist.init_process_group(backend, init_method)
  File "D:\OpenAIJukebox\envs\lib\site-packages\torch\distributed\distributed_c10d.py", line 500, in init_process_group
    store, rank, world_size = next(rendezvous_iterator)
  File "D:\OpenAIJukebox\envs\lib\site-packages\torch\distributed\rendezvous.py", line 186, in _env_rendezvous_handler
    master_port = int(master_port)
**ValueError: invalid literal for int() with base 10: '\\'**
My clean install process is:

Open miniconda terminal

conda config --add pkgs_dirs D:.pkgs D: cd D:\OpenAIJukebox conda create --prefix ./envs python=3.7.5 -y conda activate ./envs conda install mpi4py=3.0.3 -y conda update -n base -c defaults conda -y pip3 install numpy pip3 install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cu110/torch_nightly.html git clone https://github.com/openai/jukebox.git cd jukebox pip3 install -r requirements.txt pip3 install -e . conda install av=7.0.01 -c conda-forge -y

Close miniconda

open new git bash terminal from D:\OpenAIJukebox

conda activate ./envs cd ./jukebox pip install ./tensorboardX

close git bash terminal

open Miniconda terminal

D: cd D:\OpenAIJukebox\jukebox conda activate ./envs python jukebox/sample.py --model=1b_lyrics --name=sample_5b_prompted --levels=3 --mode=primed \ --audio_file=D:\OpenAIJukebox\jukebox\prompts\home.wav,D:\OpenAIJukebox\jukebox\prompts\Parallax.wav,D:\OpenAIJukebox\jukebox\prompts\Skankin.wav,D:\OpenAIJukebox\jukebox\prompts\Vitals.wav --prompt_length_in_seconds=6 \ --sample_length_in_seconds=10 --total_sample_length_in_seconds=60 --sr=44100 --n_samples=1 --hop_fraction=0.5,0.5,0.125

RTX 30XX cards use the Ampere architecture, which requires CUDA 11.x - so I've had to modify some of the original install instructions to build with the latest version of torch and the cudatoolkit & I'm wondering if that might be part of the issue? I have read other reports that people have this working on 3090's though, so not really sure where I might have gone wrong.

Any advice appreciated - I'd really love to be able to get this running on my system or at least understand where the issue is coming from. Cheers,

It works on windows 8.1 right..

EKGD commented 2 years ago

I have the same issue. iam using rtx 3060 12gb on ubuntu 22.04 . Any update please :(((

EKGD commented 2 years ago

Please use colab. An rtx 3070, 3080 and 3090 aren't designed to run this, especially on windows this is an issue.

Thanks for the reply - Are you able to confirm in more detail about how this specific issue is being caused by an RTX 30xx series GPU?

This error appears to claim that the master port is being sent a null value. Do you think that maybe this could be coming from a change within the GPU architecture?

I've managed to get this running on another Windows PC with an RTX 2070 with 6GB RAM and while it doesn't make it all the to the final upsampled stage (it displays a clear error that it's run out of RAM), it still manages to run through a few files and generate some new audio output along the way.

I can accept that more RAM is required for a successful run, but for my particular usage and interest it would be great to lock down the source of this issue.

If every github issue can be solved with 'just use colab' the open source nature of these project would never progress very far.

By sharing our bug reports with the community, there is the opportunity for a greater all-round understanding for everyone and ideally the issue can be resolved together!

Cheers

have you found the solution ?

Treeed commented 7 months ago

For me the issue was that I copied the recommended command line arguments from the readme which contains newlines escaped with backslashes. Removing these when using an actual command line fixed the issue.

openai / jukebox

ValueError: invalid literal for int() with base 10: '\\' #223

Open miniconda terminal

Close miniconda

open new git bash terminal from D:\OpenAIJukebox

close git bash terminal

open Miniconda terminal

Open miniconda terminal

Close miniconda

open new git bash terminal from D:\OpenAIJukebox

close git bash terminal

open Miniconda terminal