Closed JiarunLiu closed 1 year ago
Hi @JiarunLiu, it looks like the server is not able to write anything to the attached bucket. I imagine you see the same error if you try to log an Artifact correct?
Can you run wandb verify
in the CLI (not in the Docker container but just on a machine logged into your local server) and let me know the output?
Thank you, Nate
Here is the output.
(base) fgldlb@fgldlb-Precision-Tower-7910:~$ wandb verify
Default host selected: http://localhost:8080
Find detailed logs for this test at: /tmp/tmpnkgd22wl/wandb
Checking if logged in...................................................✅
Checking signed URL upload..............................................❌
Unable to read file successfully saved through a put request. Check SQS configurations, bucket permissions and topic configs.
Checking ability to send large payloads through proxy...................✅
Checking requests to base url...........................................❌
Connections are not made over https. SSL required for secure communications.
Checking wandb package version is up to date............................✅
Checking logged metrics, saving and downloading a file..................❌
Unable to download file. Check SQS configuration, topic configuration and bucket permissions.
Checking artifact save and download workflows...........................wandb: ERROR Error while calling W&B API: InternalServerError (500): Internal Server Error (original: %!s(<nil>)) (<Response [500]>)
wandb: ERROR Error while calling W&B API: InternalServerError (500): Internal Server Error (original: %!s(<nil>)) (<Response [500]>)
wandb: ERROR Error while calling W&B API: InternalServerError (500): Internal Server Error (original: %!s(<nil>)) (<Response [500]>)
wandb: ERROR Error while calling W&B API: InternalServerError (500): Internal Server Error (original: %!s(<nil>)) (<Response [500]>)
wandb: ERROR Error while calling W&B API: InternalServerError (500): Internal Server Error (original: %!s(<nil>)) (<Response [500]>)
wandb: ERROR Error while calling W&B API: InternalServerError (500): Internal Server Error (original: %!s(<nil>)) (<Response [500]>)
wandb: ERROR Error while calling W&B API: InternalServerError (500): Internal Server Error (original: %!s(<nil>)) (<Response [500]>)
wandb: ERROR Error while calling W&B API: InternalServerError (500): Internal Server Error (original: %!s(<nil>)) (<Response [500]>)
And I try to change file permission by chmod 777 -R
, chown -R wandb
, chgrp -R wandb
in docker. After the update, I can see the charts in web interface. But the output of wandb verify
still contain some error message:
(base) fgldlb@fgldlb-Precision-Tower-7910:~$ wandb verify
Default host selected: http://localhost:8080
Find detailed logs for this test at: /tmp/tmp02nr8c42/wandb
Checking if logged in...................................................✅
Checking signed URL upload..............................................❌
Unable to read file successfully saved through a put request. Check SQS configurations, bucket permissions and topic configs.
Checking ability to send large payloads through proxy...................✅
Checking requests to base url...........................................❌
Connections are not made over https. SSL required for secure communications.
Checking wandb package version is up to date............................✅
Checking logged metrics, saving and downloading a file..................❌
Unable to download file. Check SQS configuration, topic configuration and bucket permissions.
Checking artifact save and download workflows...........................✅
Besides, I try to upload my offline running via wandb sync
, and I meet the following error:
(base) fgldlb@fgldlb-Precision-Tower-7910:~/Documents/stDLNN/wandb$ wandb sync offline-run-20230214_014825-3vihn9xr
Syncing: http://localhost:8080/jiarunliu/stDLNN/runs/3vihn9xr ...Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/home/fgldlb/.local/lib/python3.10/site-packages/wandb/sync/sync.py", line 265, in run
sm.send(pb)
File "/home/fgldlb/.local/lib/python3.10/site-packages/wandb/sdk/internal/sender.py", line 231, in send
assert record_type
AssertionError
Thank you @JiarunLiu! It looks the issue is with the SQS configuration for the bucket. You can ignore the failed https test since you use http. Can you go through the steps here for your external bucket and make sure this is setup correctly?
Thank you, Nate
Thank you @nate-wandb! I didn't find the applied operation for me in this document cause I'm not using any external file storage (like Azure or Google Cloud?). I upgrade my wandb server to the newly released version 0.28.0
without setting local variable LOCAL_RESTORE=true
. Now everything is back to normal. Thank you again!
Hi, I was upgrade my wandb server to the latest version (docker image id:
06e5925be5dc
==>b46f78b7ffa8
) yesterday. But there are something wrong after the upgrade.First, the web interface could not rendering my projects:
The other problems is I see some error message at the beginning & end of my training:
Here are some messages filtered by "ERROR" at
my_run/logs/debug-internal.log