wandb / server

W&B Server is the self hosted version of Weights & Biases
MIT License
263 stars 21 forks source link

Got "wandb: Network error (ConnectionError), entering retry loop." when I try start server on other port. #88

Closed frostime closed 2 years ago

frostime commented 2 years ago

Hello, I'was wondering if I could change the default 8080 port to any others, however I failed and got some network errors.

I started the server by:

wandb server start --port=18080

The docker shown that the container was running correctly.

❯ docker ps -a
CONTAINER ID   IMAGE         COMMAND           CREATED          STATUS                   PORTS                                         NAMES      
2f739865626a   wandb/local   "/sbin/my_init"   11 minutes ago   Up 11 minutes            0.0.0.0:18080->8080/tcp, :::18080->8080/tcp   wandb-local
343ac6b33ad5   hello-world   "/hello"          5 hours ago      Exited (0) 5 hours ago                                                 priceless_matsumoto

And when I executed wandb login command in cli, it also worked:

❯ wandb login --relogin
wandb: You can find your API key in your browser here: http://localhost:8080/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

But when I try to run my python script, error occured:

❯ python train_model.py run.max_epochs=2
wandb: Network error (ConnectionError), entering retry loop.
wandb: W&B API key is configured. Use `wandb login --relogin` to force relogin

I guessed that it might stuck at init

run = wandb.init(
        project="Model",
        entity="frostime",
        config=OmegaConf.to_container(cfg, resolve=True, throw_on_missing=True),
        job_type=cfg.wandb.job_type,
        group=cfg.wandb.group,
        name=cfg.wandb.name,
        tags=cfg.wandb.tags,
        notes=cfg.wandb.notes
    )

But when I run wandb init in cli environment, it also worked, so I'm quite confused.

I wonder if it is really allowed to switch the binding port to other but 8080.

MBakirWB commented 2 years ago

Hi @frostime, happy to help with your question.

Wandb does support port switching and binding. What I believe is happening here is you have your project WANDB_BASE_URL configured to a different port than port 18080. This will cause the network error you are experiencing as the project is attempting to log to a non existing instance. From your terminal, check your current setting using, wandb status, verifying the "base_url" is set correctly. If it isn't, set it via export WANDB_BASE_URL=<PORT>, more on this here. Please let me know if this fixed your issue and/or if you have any questions.

frostime commented 2 years ago

Hi @frostime, happy to help with your question.

Wandb does support port switching and binding. What I believe is happening here is you have your project WANDB_BASE_URL configured to a different port than port 18080. This will cause the network error you are experiencing as the project is attempting to log to a non existing instance. From your terminal, check your current setting using, wandb status, verifying the "base_url" is set correctly. If it isn't, set it via export WANDB_BASE_URL=<PORT>, more on this here. Please let me know if this fixed your issue and/or if you have any questions.

Hi @MBakirWB , thanks so much for your help! It really works.

And I summarize the total procedure if anyone want to run wandb local server on another port, denotes <port>.

  1. Start the server

    wandb server start --port=<port>
  2. Modify the environment variable WANDB_BASE_URL

    export WANDB_BASE_URL=http://localhost:<port>

    Run it in shell.

    If anything settled, you will get the right bash_url when you run wandb status:

    ❯ wandb status
    Current Settings
    {
        "api_key": null,
        "base_url": "http://localhost:<port>",
        "entity": "frostime",
        "git_remote": "origin",
        "ignore_globs": [],
        "project": "EEGGAN",
        "root_dir": null,
        "section": "default"
    }
  3. Execute your code.

MBakirWB commented 2 years ago

Thanks for the update @frostime , glad it worked and thank you for providing a thorough example for others to follow. Please do reach back out again when you have any questions.