Closed mpvginde closed 7 months ago
Hi, wandb.init
should be called under the hood by the WandbLogger
https://github.com/joeloskarsson/neural-lam/blob/6377d447d41e9828d4c9ee4c1fd13964f1c22d20/train_model.py#L127-L128, so there should not be any need to do this manually (lightning docs: https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.loggers.wandb.html#lightning.pytorch.loggers.wandb.WandbLogger).
I remember having this issue earlier when I was working out the details around multi-GPU training, but sorted it out then (that is why the if trainer.global_rank == 0:
is there). I can't seem to get this error message even if I log out of wandb. Are you on the latest commit? What kind of hardware are you running on (cpu/single-gpu/multi-gpu)?
Hi Joel, thanks for your reply. I'm running on a single GPU (interactive PBS job on a GPU-cluster where I only ask for 1 GPU). And using commit 6377d44
Some detective work later I think I have figured out the issue. The WandbLogger
used to call wandb.init
when created (wandb.init
is called the first time the experiment
property of the logger object is accessed. But this was changed here https://github.com/Lightning-AI/lightning/commit/71559b6768653212750dd0c653dc64f259e1bbd1 (a small change for lightning, but it does brake things here). I think this changed was included in lightning 2.1.1., but my environment was still on 2.0.9.
So this does indeed need fixing. Thanks for raising this issue, as it will likely affect anyone making a new install. I think that letting the logger do the wandb.init
is still a better idea than an explicit wandb.init
call. A nice way to do this would be to use a logger.experiment
call to set up the metrics (as that will then make sure 'wandb.init` is called). Will take a look at this in a bit!
Indeed I'm using Lightning 2.1.0. You might specify pytorch-lightning>=2.0.3,<=2.0.9 in the readme as a temporary fix. I will check if downgrading fixes the error. Thanks for the detective work.
Should be fixed in https://github.com/joeloskarsson/neural-lam/commit/9912ece7f54a14b3cdfbad1735e460d2bd392dfc now, so a git pull should be enough :smile: Let me know if not.
Hi,
I'm currently doing some first tests with the train-model.py script. I'm quite new to wandb so I might have missed something, but it seems that
wandb.init('neural-lam')
is never called, which leads to the following error:wandb.errors.Error: You must call wandb.init() before wandb.define_metric()
which traces back to/neural_lam/utils.py", line 203, in init_wandb_metrics
.Adding
wandb.init('neural-lam')
here:seems to work, but I guess the name of the project should be read from
constants.py
.Kind regards, Michiel