Open LuLing06 opened 1 year ago
The models are typically trained for fewer iterations, ie nerfacto is only set to train for 30k iters. There are some instabilities that emerge when training for a long time. We have tried to look into them but have so far been unsucessful.
Thanks for your explanation. I have found the possible reason. It might be some issues with the resume process. I found when I resumed the training, the learning rate would go back to the default (0.01). It does not load the final learning rate in the last step of the previous train. There is the picture:
I used the code to resume training: ns-train nerfacto --experiment-name $exp_name --timestamp $timestamp --data $data --load-dir $resume_dir --output-dir $output_dir --max-num-iterations $iterations --vis $vis Note: resume_dir=$output_dir/$exp_name/$exp_name/$timestamp/nerfstudio_models
How can I resume the lr from the latest checkpoint?
The models are typically trained for fewer iterations, ie nerfacto is only set to train for 30k iters. There are some instabilities that emerge when training for a long time. We have tried to look into them but have so far been unsucessful.
Hi,
I meet a similar issue in that after reloading a checkpoint, the model performance drops (p1). I checked that learning rates were correctly loaded (p2). But there seemed to be other issues with the loading, as the train losses camera_opt_regularizer
and rgb_loss
dropped a lot (p3).
p1:
p2:
p3:
The loading command is ns-train nerfacto --load-dir outputs/processed/nerfacto/2024-02-29_175948/nerfstudio_models --data test/multiview_train_data/32/processed --vis wandb --max-num-iterations 60000
Is there any solution for this issue?
Describe the bug I have used nerfacto, nerfacto-big, and instant-ngp three models for my dataset. I found that the training process was unstable. It looks this way . Is there any implementation issue? does nerfstudio implement the early stopping?