wandb / server

W&B Server is the self hosted version of Weights & Biases
MIT License
262 stars 21 forks source link

How to mount /vol on arbitrary folder when start wandb via docker? #125

Closed Master-cai closed 1 year ago

Master-cai commented 1 year ago

I have a folder called /mnt/sdf/wandb_data. I want to start a new wandb via docker and mount /vol to wandb_data. So I run the following command:

docker run -d -p 28287:8080 -v /mnt/sdf/wandb_data:/vol --name wandb_f wandb/local

when i open the web page, it shows error:

An error was encountered while starting the container:
2023/08/07 07:06:18 Error adding user: open /vol/env/users.htpasswd: no such file or directory
panic: Can't create default user

goroutine 1 [running]:
main.ensureDefaults({{0x162072e, 0x4}, {0x0, 0x0}, {0xc000044016, 0x16}, {0xc000042172, 0xa}, {0x163e5de, 0x1a}, ...})
    /mnt/ramdisk/core/services/local/cmd/local/main.go:612 +0x75d
main.commands({{0x162072e, 0x4}, {0x0, 0x0}, {0xc000044016, 0x16}, {0xc000042172, 0xa}, {0x163e5de, 0x1a}, ...})
    /mnt/ramdisk/core/services/local/cmd/local/main.go:377 +0x1818
main.main()
    /mnt/ramdisk/core/services/local/cmd/local/main.go:722 +0x98

I guess it is a file permission issue, so i into the docker and validates my thoughts. In \vol folder, I can not create any file with "Permission denied".

So I tried to change the permission of wandb_data folder to 777, change the owner and group of wandb_data to root, but none of them works.

How can I mount /vol to wandb_data? Thanks!

ArtsiomWB commented 1 year ago

Hi @Master-cai! Thank you for writing in. You are right, it looks like a permission error. You need to ensure the filesystem you mount gives write permission to the root group:

chgrp -R 0 $(pwd)/wandb
chmod -R g+rwX $(pwd)/wandb
Master-cai commented 1 year ago

@ArtsiomWB Thanks! I can mount the folder now, but another error was encountered when I open the web page. image I used docker logs command to check out the log:

*** Running /etc/my_init.d/01_enable-services.sh...
*** Enabling production mode
*** Running /etc/my_init.d/02_load-settings.sh...
*** Loading settings...
2023/08/08 05:39:00 Created default user
2023/08/08 05:39:00 Generating new session key for auth...
2023/08/08 05:39:00 Generating new certificate and key for auth...
*** Booting runit daemon...
*** Runit started as PID 52
*** Setting up mysql database...
*** Starting wandb servers...
*** Configuring minio...
Bucket created successfully `local/local-files`.
Successfully added arn:minio:sqs:wandb-local:_:redis
*** Migrating database...
Initialized random go rng seed: to fix set env var TEST_GO_RNG_SEED=4699501396654252937
Invoking megabinary "migrate": GOMAXPROCS=40
2023/08/08 05:40:02 Running squash migration from version 0 to 189...
panic: context deadline exceeded

goroutine 1 [running]:
github.com/wandb/core/services/gorilla/cmd.(*migrateCommander).MainCmd(0xc00128e030, {0xc0011d7920, 0x2, 0x2})
    /mnt/ramdisk/core/services/gorilla/cmd/migrate.go:263 +0xea5
main.main()
    /mnt/ramdisk/core/services/gorilla/cmd/megabinary/main.go:74 +0x452
*** All services started
*** Access W&B at http://localhost:8080

I see a similar issue here, and I'm sure the folder I mounted is new.

sydholl commented 1 year ago

WandB Internal User commented: Master-cai commented: @ArtsiomWB Thanks! I can mount the folder now, but another error was encountered when I open the web page. image I used docker logs command to check out the log:

*** Running /etc/my_init.d/01_enable-services.sh...
*** Enabling production mode
*** Running /etc/my_init.d/02_load-settings.sh...
*** Loading settings...
2023/08/08 05:39:00 Created default user
2023/08/08 05:39:00 Generating new session key for auth...
2023/08/08 05:39:00 Generating new certificate and key for auth...
*** Booting runit daemon...
*** Runit started as PID 52
*** Setting up mysql database...
*** Starting wandb servers...
*** Configuring minio...
Bucket created successfully `local/local-files`.
Successfully added arn:minio:sqs:wandb-local:_:redis
*** Migrating database...
Initialized random go rng seed: to fix set env var TEST_GO_RNG_SEED=4699501396654252937
Invoking megabinary "migrate": GOMAXPROCS=40
2023/08/08 05:40:02 Running squash migration from version 0 to 189...
panic: context deadline exceeded

goroutine 1 [running]:
github.com/wandb/core/services/gorilla/cmd.(*migrateCommander).MainCmd(0xc00128e030, {0xc0011d7920, 0x2, 0x2})
    /mnt/ramdisk/core/services/gorilla/cmd/migrate.go:263 +0xea5
main.main()
    /mnt/ramdisk/core/services/gorilla/cmd/megabinary/main.go:74 +0x452
*** All services started
*** Access W&B at http://localhost:8080

I see a similar issue here, and I'm sure the folder I mounted is new.

vanpelt commented 1 year ago

@Master-cai the error panic: context deadline exceeded is happening when setting up the database. The underlying cause is likely a very slow / poor performing filesystem. Can you share more details about OS / version of docker you're running? One thing to try would be mounting the volume in async mode.

docker run -d -p 28287:8080 -v /mnt/sdf/wandb_data:/vol:delegated --name wandb_f wandb/local

If the filesystem at /mnt/sdf is a network filesystem like nfs it's always going to be pretty slow. You should instead mount a filesystem from a fast underlying disk ideally an SSD.

Master-cai commented 1 year ago

@vanpelt OS version: Ubuntu 18.04.2 LTS; Docker version 20.10.21, build baeda1f; the docker image of wandb is the latest.

/mnt/sdf is not a network filesystem but a HDD.

docker run -d -p 28287:8080 -v /mnt/sdf/wandb_data:/vol:delegated --name wandb_f wandb/local

I have tried this command, but it doesn't work, the error still exists.

sydholl commented 1 year ago

WandB Internal User commented: Master-cai commented: @vanpelt OS version: Ubuntu 18.04.2 LTS; Docker version 20.10.21, build baeda1f; the docker image of wandb is the latest.

/mnt/sdf is not a network filesystem but a HDD.

docker run -d -p 28287:8080 -v /mnt/sdf/wandb_data:/vol:delegated --name wandb_f wandb/local

I have tried this command, but it doesn't work, the error still exists.