wandb / wandb

🔥 A tool for visualizing and tracking your machine learning experiments. This repo contains the CLI and Python API.
https://wandb.ai
MIT License
8.59k stars 634 forks source link

[CLI]: wandb.errors.UsageError: Agent user not valid #7870

Closed dc250601 closed 20 hours ago

dc250601 commented 1 week ago

Describe the bug

This occurs quite frequently when a lot of agents are called for a particular sweep.


wandb.agent(sweep_id = sweep_id,function=runner,entity = <entity name>, project=<project_name>)
{"errors":[{"message":"Agent user not valid","path":["createAgent"]}],"data":{"createAgent":null}}
wandb: ERROR Error while calling W&B API: Agent user not valid (<Response [400]>)
wandb: ERROR Agent user not valid
Traceback (most recent call last):
  File "/global/homes/d/diptarko/miniconda3/envs/work/lib/python3.9/site-packages/wandb/sdk/lib/retry.py", line 131, in __call__
    result = self._call_fn(*args, **kwargs)
  File "/global/homes/d/diptarko/miniconda3/envs/work/lib/python3.9/site-packages/wandb/sdk/internal/internal_api.py", line 369, in execute
    return self.client.execute(*args, **kwargs)  # type: ignore
  File "/global/homes/d/diptarko/miniconda3/envs/work/lib/python3.9/site-packages/wandb/vendor/gql-0.2.0/wandb_gql/client.py", line 52, in execute
    result = self._get_result(document, *args, **kwargs)
  File "/global/homes/d/diptarko/miniconda3/envs/work/lib/python3.9/site-packages/wandb/vendor/gql-0.2.0/wandb_gql/client.py", line 60, in _get_result
    return self.transport.execute(document, *args, **kwargs)
  File "/global/homes/d/diptarko/miniconda3/envs/work/lib/python3.9/site-packages/wandb/sdk/lib/gql_request.py", line 59, in execute
    request.raise_for_status()
  File "/global/homes/d/diptarko/miniconda3/envs/work/lib/python3.9/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://api.wandb.ai/graphql

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/pscratch/sd/d/diptarko/460/ELSA_ImageNet/SSALD2/master.py", line 18, in <module>
    wandb.agent(sweep_id,runner,entity = "dc250601", project="ssald_main")
  File "/global/homes/d/diptarko/miniconda3/envs/work/lib/python3.9/site-packages/wandb/wandb_agent.py", line 581, in agent
    return pyagent(sweep_id, function, entity, project, count)
  File "/global/homes/d/diptarko/miniconda3/envs/work/lib/python3.9/site-packages/wandb/agents/pyagent.py", line 348, in pyagent
    agent.run()
  File "/global/homes/d/diptarko/miniconda3/envs/work/lib/python3.9/site-packages/wandb/agents/pyagent.py", line 319, in run
    self._setup()
  File "/global/homes/d/diptarko/miniconda3/envs/work/lib/python3.9/site-packages/wandb/agents/pyagent.py", line 136, in _setup
    self._register()
  File "/global/homes/d/diptarko/miniconda3/envs/work/lib/python3.9/site-packages/wandb/agents/pyagent.py", line 113, in _register
    agent = self._api.register_agent(socket.gethostname(), sweep_id=self._sweep_id)
  File "/global/homes/d/diptarko/miniconda3/envs/work/lib/python3.9/site-packages/wandb/apis/internal.py", line 150, in register_agent
    return self.api.register_agent(*args, **kwargs)
  File "/global/homes/d/diptarko/miniconda3/envs/work/lib/python3.9/site-packages/wandb/apis/normalize.py", line 73, in wrapper
    raise err
  File "/global/homes/d/diptarko/miniconda3/envs/work/lib/python3.9/site-packages/wandb/apis/normalize.py", line 41, in wrapper
    return func(*args, **kwargs)
  File "/global/homes/d/diptarko/miniconda3/envs/work/lib/python3.9/site-packages/wandb/sdk/internal/internal_api.py", line 2936, in register_agent
    response = self.gql(
  File "/global/homes/d/diptarko/miniconda3/envs/work/lib/python3.9/site-packages/wandb/sdk/internal/internal_api.py", line 341, in gql
    ret = self._retry_gql(
  File "/global/homes/d/diptarko/miniconda3/envs/work/lib/python3.9/site-packages/wandb/sdk/lib/retry.py", line 147, in __call__
    retry_timedelta_triggered = check_retry_fn(e)
  File "/global/homes/d/diptarko/miniconda3/envs/work/lib/python3.9/site-packages/wandb/util.py", line 878, in no_retry_4xx
    raise UsageError(body["errors"][0]["message"])
wandb.errors.UsageError: Agent user not valid

Additional Files

I get this error when launching new agents. This occurs randomly when a lot of agents are launched. Is there a limit on how many agents we can launch which might cause this. I have tried deleting the .netrc file but did not resolve the issue.

Environment

WandB version: 0.16.2

Python version: 3.9.18

Additional Context

No response

ArtsiomWB commented 1 week ago

Hi @dc250601! Thank you for writing in!

Could you please try upgrading to the latest version of wandb and see if you are still running into this behavior. How many agents are you calling for your sweep?

ArtsiomWB commented 20 hours ago

Hi, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!