prs-eth / Marigold

[CVPR 2024 - Oral, Best Paper Award Candidate] Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
https://marigoldmonodepth.github.io
Apache License 2.0
2.25k stars 124 forks source link

noob questions about code understanding. #104

Closed YacineDeghaies closed 1 month ago

YacineDeghaies commented 1 month ago

I've been studying the train.py code and I can't figure out this: If we continue resume training from a checkpoint, then we need to retrieve wandb_id from the WAND_ID file in the checkpoint parent directory, just like this code states:

https://github.com/prs-eth/Marigold/blob/f74115261b67b59fb536994d0413f64d69af65c5/src/util/logging_util.py#L85C1-L88C20

After I downloaded the marigold-v1-0 checkpoint from here : bash script/download_weights.sh marigold-v1-0

I can find neither the config.yaml specific to the resume_run, nor the WANDB_ID file.

Am I missing something ? Thanks!

HarryWang355 commented 1 month ago

I have the same issue. After disabling wandb, there are more incompatibilities down the line. For example, the script will attempt to load "trainer.ckpt" from the checkpoint folder, which is not there. I think "resume_run" expects a raw checkpoint folder produced by the training script, not the "cleaned" checkpoint that the authors provided.

YacineDeghaies commented 1 month ago

I think "resume_run" expects a raw checkpoint folder produced by the training script, not the "cleaned" checkpoint that the authors provided.

Yes, I noticed it's meant for resuming my own training run . Why do you want to disable wandb ?

HarryWang355 commented 1 month ago

Why do you want to disable wandb ?

Just an ad hoc fix to bypass the error caused by the absence of the WANDB_ID file.