Closed YacineDeghaies closed 1 month ago
I have the same issue. After disabling wandb, there are more incompatibilities down the line. For example, the script will attempt to load "trainer.ckpt" from the checkpoint folder, which is not there. I think "resume_run" expects a raw checkpoint folder produced by the training script, not the "cleaned" checkpoint that the authors provided.
I think "resume_run" expects a raw checkpoint folder produced by the training script, not the "cleaned" checkpoint that the authors provided.
Yes, I noticed it's meant for resuming my own training run . Why do you want to disable wandb ?
Why do you want to disable wandb ?
Just an ad hoc fix to bypass the error caused by the absence of the WANDB_ID file.
I've been studying the train.py code and I can't figure out this: If we continue resume training from a checkpoint, then we need to retrieve wandb_id from the WAND_ID file in the checkpoint parent directory, just like this code states:
https://github.com/prs-eth/Marigold/blob/f74115261b67b59fb536994d0413f64d69af65c5/src/util/logging_util.py#L85C1-L88C20
After I downloaded the marigold-v1-0 checkpoint from here :
bash script/download_weights.sh marigold-v1-0
I can find neither the config.yaml specific to the resume_run, nor the WANDB_ID file.
Am I missing something ? Thanks!