wandb / server

W&B Server is the self hosted version of Weights & Biases
MIT License
245 stars 21 forks source link

Log Artifact Error #137

Open wangyiran33 opened 5 months ago

wangyiran33 commented 5 months ago

I am using Docker to deploy a local service of wandb without using any external data storage. However, I always encounter an 'upload failure' error when executing the 'log artifact' operation. Could you please advise on the possible reason for this issue and how to solve it? Thank you!

image version: wandb/local:latest sha256:a5a06d78d0e92397b48090490742ae86640c6fbeeddedac51e0cd5438a162812

client ERROR MESSAGE:

wandb: ERROR Error uploading "wandb-metadata.json": CommError, <Response [400]>
wandb: ERROR Error uploading "/Users/xingwen/Library/Application Support/wandb/artifacts/staging/tmpor1_h4l1": CommError, <Response [400]>
wandb: ERROR Uploading artifact file failed. Artifact won't be committed.
wandb: ERROR Error uploading "media/table/Table Name_0_44c6fc6ffebdb301e2f2.table.json": CommError, <Response [400]>
wandb: ERROR Error uploading "media/table/Table Name_0_44c6fc6ffebdb301e2f2.table.json": CommError, <Response [400]>

when i command wandb verify, output:

Default host selected: http://192.168.36.67:8080/
Find detailed logs for this test at: /var/folders/6t/8tvgy8_x5fd8thjl30xvz4_m0000gn/T/tmp3uvsxje9/wandb
Checking if logged in...................................................✅
Checking signed URL upload..............................................Traceback (most recent call last):
  File "/Users/xingwen/miniconda3/envs/py3.10/bin/wandb", line 8, in <module>
    sys.exit(cli())
  File "/Users/xingwen/miniconda3/envs/py3.10/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/Users/xingwen/miniconda3/envs/py3.10/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/Users/xingwen/miniconda3/envs/py3.10/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/xingwen/miniconda3/envs/py3.10/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/xingwen/miniconda3/envs/py3.10/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/Users/xingwen/miniconda3/envs/py3.10/lib/python3.10/site-packages/wandb/cli/cli.py", line 2864, in verify
    url_success, url = wandb_verify.check_graphql_put(api, host)
  File "/Users/xingwen/miniconda3/envs/py3.10/lib/python3.10/site-packages/wandb/sdk/verify/verify.py", line 399, in check_graphql_put
    contents = read_file.read()
AttributeError: 'NoneType' object has no attribute 'read'

server ERROR MESSAGE in /var/log/mysql.log:

2024-04-01T07:08:04.262017Z 43 [Note] Aborted connection 43 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
2024-04-01T07:08:04.262063Z 42 [Note] Aborted connection 42 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
2024-04-01T07:08:04.262097Z 38 [Note] Aborted connection 38 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
2024-04-01T07:08:04.262105Z 40 [Note] Aborted connection 40 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
2024-04-01T07:08:04.262059Z 41 [Note] Aborted connection 41 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
umakrishnaswamy commented 5 months ago

hey @wangyiran33 - this could be due to several reasons:

Docker Version Compatibility: Ensure that the Docker version you are using is compatible with the wandb/local:latest image. Docker compatibility issues can sometimes lead to unexpected behavior.

W&B Local Setup: Verify that your W&B local server is set up correctly, especially regarding its connection to its backend database. The MySQL error logs suggest there might be issues with database connections ("Got an error reading communication packets"). This could indicate network problems, MySQL server limits, or configuration issues within the MySQL server (e.g., max_allowed_packet size).

wangyiran33 commented 5 months ago

hey @wangyiran33 - this could be due to several reasons:

Docker Version Compatibility: Ensure that the Docker version you are using is compatible with the wandb/local:latest image. Docker compatibility issues can sometimes lead to unexpected behavior.

W&B Local Setup: Verify that your W&B local server is set up correctly, especially regarding its connection to its backend database. The MySQL error logs suggest there might be issues with database connections ("Got an error reading communication packets"). This could indicate network problems, MySQL server limits, or configuration issues within the MySQL server (e.g., max_allowed_packet size).

  • do you have any firewall / proxy setup? sometimes this can interfere with wandb's ability to upload/download artifacts

Thanks for your reply! I didn't change any default config(e.g., max_allowed_packet size) for MySQL server, and I don't think firewall actually works because i can log a run and log run logs and the only thing i failed is to log a table.

umakrishnaswamy commented 4 months ago

hey @wangyiran33 - are you able to download artifacts without error?

additionally, the error you're seeing in the mysql logs indicates that there was an issue with a database connection that was abruptly terminated. this could be due to a network connection interruption, server overload (I recommend checking the server's resource usage), or a connection timeout (could you also try increasing wait_timeout and interactive_timeout settings in your MySQL configuration file?)

also, would it be possible to check the docker logs to see if there are any errors there? and what local server version are you running?

thank you for trying the above and providing this info!

umakrishnaswamy commented 4 months ago

@wangyiran33 - since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!