wandb / server

W&B Server is the self hosted version of Weights & Biases
MIT License
263 stars 21 forks source link

Loading your local environment... #123

Open Master-cai opened 1 year ago

Master-cai commented 1 year ago

I have used docker(20.10.21) to run wandb in a Ubuntu18.04 server. When I restarted the server and wandb, it doesn't work properly anymore:

Loading your local environment...
Loading is taking longer than expected. Check stdout, or the system logs at /var/log for error messages. You can restart your server with the environment variable LOCAL_RESTORE=true to regain access if you're unable to login.

I checked some related issues and tried to get some log:

docker logs wandb-local:

goroutine 1 [running]:
github.com/wandb/core/services/gorilla/cmd.(*migrateCommander).MainCmd(0xc000dfe9b0, {0xc000e42280, 0x2, 0x2})
    /mnt/ramdisk/core/services/gorilla/cmd/migrate.go:88 +0x9b4
main.main()
    /mnt/ramdisk/core/services/gorilla/cmd/megabinary/main.go:57 +0x227
*** Migrating database...
panic: dial tcp 127.0.0.1:3306: connect: connection refused

goroutine 1 [running]:
github.com/wandb/core/services/gorilla/cmd.(*migrateCommander).MainCmd(0xc000632330, {0xc00064e200, 0x2, 0x2})
    /mnt/ramdisk/core/services/gorilla/cmd/migrate.go:88 +0x9b4
main.main()
    /mnt/ramdisk/core/services/gorilla/cmd/megabinary/main.go:57 +0x227

with docker exec -it wandb-local bash; cat /var/log/mysql.log, I got:

./run: line 59: fg: job has terminated
2023-07-03T12:02:21.935888Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2023-07-03T12:02:21.937709Z 0 [Note] mysqld (mysqld 5.7.39) starting as process 12709 ...
2023-07-03T12:02:21.940791Z 0 [Note] InnoDB: PUNCH HOLE support available
2023-07-03T12:02:21.940810Z 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2023-07-03T12:02:21.940815Z 0 [Note] InnoDB: Uses event mutexes
2023-07-03T12:02:21.940820Z 0 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2023-07-03T12:02:21.940825Z 0 [Note] InnoDB: Compressed tables use zlib 1.2.12
2023-07-03T12:02:21.940830Z 0 [Note] InnoDB: Using Linux native AIO
2023-07-03T12:02:21.941240Z 0 [Note] InnoDB: Number of pools: 1
2023-07-03T12:02:21.941346Z 0 [Note] InnoDB: Using CPU crc32 instructions
2023-07-03T12:02:21.943173Z 0 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M
2023-07-03T12:02:21.954000Z 0 [Note] InnoDB: Completed initialization of buffer pool
2023-07-03T12:02:21.956323Z 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2023-07-03T12:02:21.968675Z 0 [Note] InnoDB: Highest supported file format is Barracuda.
2023-07-03T12:02:22.002158Z 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2023-07-03T12:02:22.002217Z 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2023-07-03T12:02:22.161402Z 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2023-07-03T12:02:22.162918Z 0 [Note] InnoDB: 96 redo rollback segment(s) found. 96 redo rollback segment(s) are active.
2023-07-03T12:02:22.162950Z 0 [Note] InnoDB: 32 non-redo rollback segment(s) are active.
2023-07-03T12:02:22.163620Z 0 [Note] InnoDB: Waiting for purge to start
2023-07-03T12:02:22.213838Z 0 [Note] InnoDB: 5.7.39 started; log sequence number 5066781119
2023-07-03T12:02:22.214177Z 0 [Note] InnoDB: Loading buffer pool(s) from /vol/mysql/ib_buffer_pool
2023-07-03T12:02:22.214470Z 0 [Note] Plugin 'FEDERATED' is disabled.
2023-07-03T12:02:22.215143Z 0 [Note] InnoDB: Buffer pool(s) load completed at 230703 12:02:22
2023-07-03T12:02:22.223473Z 0 [Note] Found ca.pem, server-cert.pem and server-key.pem in data directory. Trying to enable SSL support using them.
2023-07-03T12:02:22.223504Z 0 [Note] Skipping generation of SSL certificates as certificate files are present in data directory.
2023-07-03T12:02:22.223513Z 0 [Warning] A deprecated TLS version TLSv1 is enabled. Please use TLSv1.2 or higher.
2023-07-03T12:02:22.223518Z 0 [Warning] A deprecated TLS version TLSv1.1 is enabled. Please use TLSv1.2 or higher.
2023-07-03T12:02:22.224305Z 0 [Warning] CA certificate ca.pem is self signed.
2023-07-03T12:02:22.224359Z 0 [Note] Skipping generation of RSA key pair as key files are present in data directory.
2023-07-03T12:02:22.225078Z 0 [Note] Server hostname (bind-address): '*'; port: 3306
2023-07-03T12:02:22.225123Z 0 [Note] IPv6 is available.
2023-07-03T12:02:22.225133Z 0 [Note]   - '::' resolves to '::';
2023-07-03T12:02:22.225156Z 0 [Note] Server socket created on IP: '::'.
2023-07-03T12:02:22.225250Z 0 [ERROR] Another process with pid 94 is using unix socket file.
2023-07-03T12:02:22.225261Z 0 [ERROR] Unable to setup unix socket lock file.
2023-07-03T12:02:22.225267Z 0 [ERROR] Aborting

2023-07-03T12:02:22.225278Z 0 [Note] Binlog end
2023-07-03T12:02:22.225337Z 0 [Note] Shutting down plugin 'ngram'
2023-07-03T12:02:22.225348Z 0 [Note] Shutting down plugin 'partition'
2023-07-03T12:02:22.225354Z 0 [Note] Shutting down plugin 'BLACKHOLE'
2023-07-03T12:02:22.225359Z 0 [Note] Shutting down plugin 'ARCHIVE'
2023-07-03T12:02:22.225364Z 0 [Note] Shutting down plugin 'PERFORMANCE_SCHEMA'
2023-07-03T12:02:22.225443Z 0 [Note] Shutting down plugin 'MRG_MYISAM'
2023-07-03T12:02:22.225449Z 0 [Note] Shutting down plugin 'MyISAM'
2023-07-03T12:02:22.225463Z 0 [Note] Shutting down plugin 'INNODB_SYS_VIRTUAL'
2023-07-03T12:02:22.225468Z 0 [Note] Shutting down plugin 'INNODB_SYS_DATAFILES'
2023-07-03T12:02:22.225473Z 0 [Note] Shutting down plugin 'INNODB_SYS_TABLESPACES'
2023-07-03T12:02:22.225478Z 0 [Note] Shutting down plugin 'INNODB_SYS_FOREIGN_COLS'
2023-07-03T12:02:22.225483Z 0 [Note] Shutting down plugin 'INNODB_SYS_FOREIGN'
2023-07-03T12:02:22.225489Z 0 [Note] Shutting down plugin 'INNODB_SYS_FIELDS'
2023-07-03T12:02:22.225496Z 0 [Note] Shutting down plugin 'INNODB_SYS_COLUMNS'
2023-07-03T12:02:22.225503Z 0 [Note] Shutting down plugin 'INNODB_SYS_INDEXES'
2023-07-03T12:02:22.225509Z 0 [Note] Shutting down plugin 'INNODB_SYS_TABLESTATS'
2023-07-03T12:02:22.225514Z 0 [Note] Shutting down plugin 'INNODB_SYS_TABLES'
2023-07-03T12:02:22.225519Z 0 [Note] Shutting down plugin 'INNODB_FT_INDEX_TABLE'
2023-07-03T12:02:22.225523Z 0 [Note] Shutting down plugin 'INNODB_FT_INDEX_CACHE'
2023-07-03T12:02:22.225528Z 0 [Note] Shutting down plugin 'INNODB_FT_CONFIG'
2023-07-03T12:02:22.225556Z 0 [Note] Shutting down plugin 'INNODB_FT_BEING_DELETED'
2023-07-03T12:02:22.225563Z 0 [Note] Shutting down plugin 'INNODB_FT_DELETED'
2023-07-03T12:02:22.225568Z 0 [Note] Shutting down plugin 'INNODB_FT_DEFAULT_STOPWORD'
2023-07-03T12:02:22.225572Z 0 [Note] Shutting down plugin 'INNODB_METRICS'
2023-07-03T12:02:22.225577Z 0 [Note] Shutting down plugin 'INNODB_TEMP_TABLE_INFO'
2023-07-03T12:02:22.225581Z 0 [Note] Shutting down plugin 'INNODB_BUFFER_POOL_STATS'
2023-07-03T12:02:22.225586Z 0 [Note] Shutting down plugin 'INNODB_BUFFER_PAGE_LRU'
2023-07-03T12:02:22.225605Z 0 [Note] Shutting down plugin 'INNODB_BUFFER_PAGE'
2023-07-03T12:02:22.225611Z 0 [Note] Shutting down plugin 'INNODB_CMP_PER_INDEX_RESET'
2023-07-03T12:02:22.225617Z 0 [Note] Shutting down plugin 'INNODB_CMP_PER_INDEX'
2023-07-03T12:02:22.225623Z 0 [Note] Shutting down plugin 'INNODB_CMPMEM_RESET'
2023-07-03T12:02:22.225629Z 0 [Note] Shutting down plugin 'INNODB_CMPMEM'
2023-07-03T12:02:22.225635Z 0 [Note] Shutting down plugin 'INNODB_CMP_RESET'
2023-07-03T12:02:22.225642Z 0 [Note] Shutting down plugin 'INNODB_CMP'
2023-07-03T12:02:22.225648Z 0 [Note] Shutting down plugin 'INNODB_LOCK_WAITS'
2023-07-03T12:02:22.225654Z 0 [Note] Shutting down plugin 'INNODB_LOCKS'
2023-07-03T12:02:22.225660Z 0 [Note] Shutting down plugin 'INNODB_TRX'
2023-07-03T12:02:22.225666Z 0 [Note] Shutting down plugin 'InnoDB'
2023-07-03T12:02:22.225766Z 0 [Note] InnoDB: FTS optimize thread exiting.
2023-07-03T12:02:22.225889Z 0 [Note] InnoDB: Starting shutdown...
2023-07-03T12:02:22.326148Z 0 [Note] InnoDB: Dumping buffer pool(s) to /vol/mysql/ib_buffer_pool
2023-07-03T12:02:22.326531Z 0 [Note] InnoDB: Buffer pool(s) dump completed at 230703 12:02:22
2023-07-03T12:02:23.850147Z 0 [Note] InnoDB: Shutdown completed; log sequence number 5066781138
2023-07-03T12:02:23.854131Z 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2023-07-03T12:02:23.854166Z 0 [Note] Shutting down plugin 'MEMORY'
2023-07-03T12:02:23.854181Z 0 [Note] Shutting down plugin 'CSV'
2023-07-03T12:02:23.854193Z 0 [Note] Shutting down plugin 'sha256_password'
2023-07-03T12:02:23.854202Z 0 [Note] Shutting down plugin 'mysql_native_password'
2023-07-03T12:02:23.854477Z 0 [Note] Shutting down plugin 'binlog'
2023-07-03T12:02:23.856311Z 0 [Note] mysqld: Shutdown complete

what happened? Because there is important data, so I dare not operate casually. Is there anyone can help me to fix it? Thanks very much!

Master-cai commented 1 year ago

UPDATE

I use docker inspect to find the wandb docker are mount at /var/lib/docker/volumes/7906fcb0.../_data. I copy it to my own dir e.g. wandb_data and use chmod -R 777 wandb_data. The wandb_data dir includes env, minio and mysql dir, which is same as _data. Then I start a new wandb docker and use -v opt to mount /vol to wandb_data.

Now I can open the wandb web page and login. However, I can only see my project name. When I click into a project, it shows There was a problem rendering these panels.:

image

And i click into a specific run, and the web crashed: InternalServerError (500): Internal Server Error (original: %!s(<nil>))

image

what should I do?

Master-cai commented 1 year ago

WandB Internal User commented: Master-cai commented: UPDATE

I use docker inspect to find the wandb docker are mount at /var/lib/docker/volumes/7906fcb0.../_data. I copy it to my own dir e.g. wandb_data and use chmod -R 777 wandb_data. The wandb_data dir includes env, minio and mysql dir, which is same as _data. Then I start a new wandb docker and use -v opt to mount /vol to wandb_data.

Now I can open the wandb web page and login. However, I can only see my project name. When I click into a project, it shows There was a problem rendering these panels.: image

And i click into a specific run, and the web crashed: InternalServerError (500): Internal Server Error (original: %!s(<nil>)) image

what should I do?

have you commented anything? I can not see it.

ArtsiomWB commented 1 year ago

Hi @Master-cai ! Apologies you are running into this, what version of the server are you running this on? Also could you send me minimal reproduction code for this issue?

Also, could you send me your browser logs after refreshing your page and running into those errors again?

Master-cai commented 1 year ago

Hi @Master-cai ! Apologies you are running into this, what version of the server are you running this on? Also could you send me minimal reproduction code for this issue?

Also, could you send me your browser logs after refreshing your page and running into those errors again?

Thanks for your reply! how to check the version of the server? I use docker inspect and get "Image": "sha256:03b0c120304a4ee2857e24ed09a3e91d33306ceff00b4ac761b59a8d27a70c67". May be this can help you to determine the version.

There is no code involved here, it just happens when I open the web page, I haven't tried to see if it can record new records.

Also, I do not know how to get the browser logs, can you give me some directions ?

Thanks!

ArtsiomWB commented 1 year ago

Thank you for your timely response as well!

Off of the logs you sent me, it looks like you are running into issues with your instance's SQL database: 2023-07-03T12:02:22.225261Z 0 [ERROR] Unable to setup unix socket lock file. 2023-07-03T12:02:23.856311Z 0 [Note] mysqld: Shutdown complete

Seems like a file was improperly closed/corrupted.

Could you please try restarting your machine, as well as the docker container you are running wandb on?

To get your browser logs, please right-click anywhere on wandb webpage and select "inspect", there(usually a new window opens on the right hand side of your screen), select the console option

image

and refresh your page and get the error to show up one more time.

sydholl commented 1 year ago

WandB Internal User commented: ArtsiomWB commented: Thank you for your timely response as well!

Off of the logs you sent me, it looks like you are running into issues with your instance's SQL database: 2023-07-03T12:02:22.225261Z 0 [ERROR] Unable to setup unix socket lock file. 2023-07-03T12:02:23.856311Z 0 [Note] mysqld: Shutdown complete

Seems like a file was improperly closed/corrupted.

Could you please try restarting your machine, as well as the docker container you are running wandb on?

To get your browser logs, please right-click anywhere on wandb webpage and select "inspect", there(usually a new window opens on the right hand side of your screen), select the console option

image

and refresh your page and get the error to show up one more time.

sydholl commented 1 year ago

Artsiom Skarakhod commented: Hi there, I wanted to follow up on this request. Please let us know if we can be of further assistance or if your issue has been resolved.

sydholl commented 1 year ago

Artsiom Skarakhod commented: Hi, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!