Open Kenny-Ch opened 2 months ago
Paulo Sabile commented: Request #74598 "[SDK] got some problem when uplo..." was closed and merged into this request. Last comment in request #74598:
when i do training, i found that wandb suddenly can't upload wandb-metadata.json
. After training , I try to upload the file with wandb sync
and I got these error.
wandb sync wandb/run-20240826_190835-g7b6iqjc/
Find logs at: /home/JIng/kenny/Project/personal_copilot/training/wandb/debug-cli.JIng.log
Syncing: http://localhost:8080/charly/personal-code-copilot/runs/g7b6iqjc ... wandb: ERROR Error uploading "code/train.py": CommError, <Response [507]>
wandb: ERROR Error uploading "wandb-metadata.json": CommError, <Response [507]>
wandb: ERROR Error uploading "wandb-summary.json": CommError, <Response [507]>
wandb: ERROR Error uploading "conda-environment.yaml": CommError, <Response [507]>
wandb: ERROR Error uploading "output.log": CommError, <Response [507]>
wandb: ERROR Error uploading "requirements.txt": CommError, <Response [507]>
wandb: ERROR Error uploading "config.yaml": CommError, <Response [507]>
and I also got the error when I running wandb verify
Default host selected: http://localhost:8080
Find detailed logs for this test at: /tmp/tmp5033o82e/wandb
Checking if logged in...................................................✅
Checking signed URL upload..............................................Traceback (most recent call last):
File "/home/JIng/miniconda3/envs/starcode-3b/bin/wandb", line 8, in <module>
sys.exit(cli())
^^^^^
File "/home/JIng/miniconda3/envs/starcode-3b/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/JIng/miniconda3/envs/starcode-3b/lib/python3.11/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/home/JIng/miniconda3/envs/starcode-3b/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/JIng/miniconda3/envs/starcode-3b/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/JIng/miniconda3/envs/starcode-3b/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/JIng/miniconda3/envs/starcode-3b/lib/python3.11/site-packages/wandb/cli/cli.py", line 2960, in verify
url_success, url = wandb_verify.check_graphql_put(api, host)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/JIng/miniconda3/envs/starcode-3b/lib/python3.11/site-packages/wandb/sdk/verify/verify.py", line 400, in check_graphql_put
contents = read_file.read()
^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'read'
here is some error log I found in /var/log
./gorilla-glue.log:{"level":"ERROR","time":"2024-08-29T05:34:12.313204066Z","info":{"program":"megabinary","source":"github.com/wandb/core/services/gorilla/pkg/observability/gerr/reporting.go:193","pid":486,"errors":[{"type":"*errors.errorString","error":"no known task \"PUBLISHCUSTOMMETRICS\""}]},"data":{"dd.service":"glue","dd.version":"d0d66fce2fc6aeaa7c70fa9ee8a244032098aa7b"},"message":"no known task \"PUBLISHCUSTOMMETRICS\""}
./gorilla-glue.log:{"level":"ERROR","time":"2024-08-29T05:35:00.058451284Z","info":{"program":"megabinary","source":"github.com/wandb/core/services/gorilla/pkg/observability/gerr/reporting.go:193","pid":486,"errors":[{"type":"*errors.errorString","error":"task 24:garbage_collect_runs_v2 paused due to repeated failures"}]},"data":{"dd.service":"glue","dd.version":"d0d66fce2fc6aeaa7c70fa9ee8a244032098aa7b"},"message":"task 24:garbage_collect_runs_v2 paused due to repeated failures"}
./gorilla-glue.log:{"level":"ERROR","time":"2024-08-29T05:35:00.058625177Z","info":{"program":"megabinary","source":"github.com/wandb/core/services/gorilla/pkg/observability/gerr/reporting.go:193","pid":486,"errors":[{"type":"*errors.errorString","error":"task 33:FlatRunsMigrator paused due to repeated failures"}]},"data":{"dd.service":"glue","dd.version":"d0d66fce2fc6aeaa7c70fa9ee8a244032098aa7b"},"message":"task 33:FlatRunsMigrator paused due to repeated failures"}
./gorilla-glue.log:{"level":"ERROR","time":"2024-08-29T05:35:12.317314097Z","info":{"program":"megabinary","source":"github.com/wandb/core/services/gorilla/pkg/observability/gerr/reporting.go:193","pid":486,"errors":[{"type":"*errors.errorString","error":"no known task \"PUBLISHCUSTOMMETRICS\""}]},"data":{"dd.service":"glue","dd.version":"d0d66fce2fc6aeaa7c70fa9ee8a244032098aa7b"},"message":"no known task \"PUBLISHCUSTOMMETRICS\""}
./gorilla-glue.log:{"level":"ERROR","time":"2024-08-29T05:36:12.317093934Z","info":{"program":"megabinary","source":"github.com/wandb/core/services/gorilla/pkg/observability/gerr/reporting.go:193","pid":486,"errors":[{"type":"*errors.errorString","error":"no known task \"PUBLISHCUSTOMMETRICS\""}]},"data":{"dd.service":"glue","dd.version":"d0d66fce2fc6aeaa7c70fa9ee8a244032098aa7b"},"message":"no known task \"PUBLISHCUSTOMMETRICS\""}
./gorilla-glue.log:{"level":"ERROR","time":"2024-08-29T05:37:12.316296925Z","info":{"program":"megabinary","source":"github.com/wandb/core/services/gorilla/pkg/observability/gerr/reporting.go:193","pid":486,"errors":[{"type":"*errors.errorString","error":"no known task \"PUBLISHCUSTOMMETRICS\""}]},"data":{"dd.service":"glue","dd.version":"d0d66fce2fc6aeaa7c70fa9ee8a244032098aa7b"},"message":"no known task \"PUBLISHCUSTOMMETRICS\""}
./gorilla-glue.log:{"level":"ERROR","time":"2024-08-29T05:38:12.315714385Z","info":{"program":"megabinary","source":"github.com/wandb/core/services/gorilla/pkg/observability/gerr/reporting.go:193","pid":486,"errors":[{"type":"*errors.errorString","error":"no known task \"PUBLISHCUSTOMMETRICS\""}]},"data":{"dd.service":"glue","dd.version":"d0d66fce2fc6aeaa7c70fa9ee8a244032098aa7b"},"message":"no known task \"PUBLISHCUSTOMMETRICS\""}
./gorilla.log:{"level":"ERROR","time":"2024-08-29T05:32:51.071134593Z","info":{"program":"gorilla","source":"github.com/wandb/core/services/gorilla/pkg/observability/gerr/reporting.go:193","pid":59},"data":{"dd.service":"gorilla","dd.version":"d0d66fce2fc6aeaa7c70fa9ee8a244032098aa7b","http":{"url":"http://192.168.104.9/oidc/auth","method":"GET","headers":{"Host":"192.168.104.9","Connection":"close","User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36 Edg/128.0.0.0","Accept-Encoding":"gzip, deflate","Accept-Language":"zh,en-US;q=0.9,en;q=0.8","X-Original-Uri":"/system-admin/static/css/main.c9951160.css.map","X-Forwarded-For":"192.168.104.9"}}},"message":"Not logged in","dd.trace_id":"10464612527120353434","error":{"kind":"*errors.errorString","message":"Not logged in"}}
./mysql.log:2024-08-29T05:33:11.670654Z 27 [Note] Aborted connection 27 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670658Z 22 [Note] Aborted connection 22 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670773Z 25 [Note] Aborted connection 25 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670709Z 23 [Note] Aborted connection 23 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670743Z 21 [Note] Aborted connection 21 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670767Z 24 [Note] Aborted connection 24 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670656Z 28 [Note] Aborted connection 28 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670788Z 17 [Note] Aborted connection 17 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670797Z 26 [Note] Aborted connection 26 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670889Z 20 [Note] Aborted connection 20 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670895Z 19 [Note] Aborted connection 19 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670958Z 18 [Note] Aborted connection 18 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.768660Z 7 [Note] Aborted connection 7 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:12.194361Z 15 [Note] Aborted connection 15 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:12.194462Z 8 [Note] Aborted connection 8 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:12.194516Z 11 [Note] Aborted connection 11 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:12.194523Z 9 [Note] Aborted connection 9 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:12.194478Z 13 [Note] Aborted connection 13 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
and here is the debug bundle: debug.zip
Hi @Kenny-Ch Good day and thank you for reaching out to us. Happy to help you on this!
Let me assist you to troubleshoot this. Do you know if you are having storage issues in wandb? Can you check on your Teams settings if you are reaching the storage limit?
Also, may I know your current SDK version? You can get this by running wandb --version
. Thank you!
Hi @Kenny-Ch Good day and thank you for reaching out to us. Happy to help you on this!
Let me assist you to troubleshoot this. Do you know if you are having storage issues in wandb? Can you check on your Teams settings if you are reaching the storage limit?
Also, may I know your current SDK version? You can get this by running
wandb --version
. Thank you!
hi @paulosabile-wb glad to hear from you.
I have check my local disk has enough space to upload the wandb-metadata.json
,however i have no idea where to find the storage limit in my page, here is the team page in my self-host server:
and my wandb version is: 0.17.6
Thank you for confirming this @Kenny-Ch. Could you please try to use the latest version 0.17.8 and let us know if the errors are still the same?
When was the last time you were able to run an experiment? Do you know what changed before you encountered this error?
If error still persist on the latest version, could you please share the debug-internal.log
and debug.log
for the affected run. These files are under your local folder wandb/run-_-/logs
in the same directory where you’re running your code. These files will help us with more details about this error.
Thank you!
WandB Internal User commented: paulosabile-wb commented: Hi @Kenny-Ch Good day and thank you for reaching out to us. Happy to help you on this!
Let me assist you to troubleshoot this. Do you know if you are having storage issues in wandb? Can you check on your Teams settings if you are reaching the storage limit?
Also, may I know your current SDK version? You can get this by running wandb --version
. Thank you!
when i do training, i found that wandb suddenly can't upload
wandb-metadata.json
. After training , I try to upload the file withwandb sync
and I got these error.and I also got the error when I running
wandb verify
here is some error log I found in
/var/log
and here is the debug bundle: debug.zip