isl-org / ZoeDepth

Metric depth estimation from a single image
MIT License
2.24k stars 207 forks source link

Is there a reason why the training does not start ? #126

Closed YacineDeghaies closed 1 month ago

YacineDeghaies commented 1 month ago

I'm trying to train on my custom dataset. For that I only modified the 'nyu' dictionary in the config file just like this tutorial https://medium.com/@bhaskarbose1998/monocular-depth-estimation-using-zoedepth-our-experience-42fa5974cb59

When I run : python train_mono.py -m zoedepth --pretrained_resource=""

I get this: 0 tcp://127.0.0.1:15008 Config: {'attractor_alpha': 1000, 'attractor_gamma': 2, 'attractor_kind': 'mean', 'attractor_type': 'inv', 'aug': True, 'avoid_boundary': False, 'batch_size': 1, 'bin_centers_type': 'softplus', 'bin_embedding_dim': 128, 'bs': 1, 'clip_grad': 0.1, 'cycle_momentum': True, 'data_path': '/vol/fob-vol3/mi20/deghaisa/code/shot_0003/1_source_sequence', 'data_path_eval': '/vol/fob-vol3/mi20/deghaisa/code/shot_0003/1_source_sequence', 'dataset': 'nyu', 'degree': 1.0, 'dist_backend': 'nccl', 'dist_url': 'tcp://127.0.0.1:15008', 'distributed': True, 'div_factor': 1, 'do_kb_crop': False, 'do_random_rotate': True, 'eigen_crop': True, 'encoder_lr_factor': 10, 'epochs': 5, 'filenames_file': './train_test_inputs/nyudepthv2_train_files_with_gt.txt', 'filenames_file_eval': './train_test_inputs/nyudepthv2_test_files_with_gt.txt', 'final_div_factor': 10000, 'freeze_midas_bn': True, 'garg_crop': False, 'gpu': None, 'gt_path': '/vol/fob-vol3/mi20/deghaisa/code/shot_0003/2_gt_depth', 'gt_path_eval': '/vol/fob-vol3/mi20/deghaisa/code/shot_0003/2_gt_depth', 'img_size': [384, 512], 'input_height': 480, 'input_width': 640, 'inverse_midas': False, 'log_images_every': 0.1, 'lr': 0.000161, 'max_depth': 10, 'max_depth_diff': 10, 'max_depth_eval': 10, 'max_temp': 50.0, 'max_translation': 100, 'memory_efficient': True, 'midas_lr_factor': 1, 'midas_model_type': 'DPT_BEiT_L_384', 'min_depth': 0.001, 'min_depth_diff': -10, 'min_depth_eval': 0.001, 'min_temp': 0.0212, 'mode': 'train', 'model': 'zoedepth', 'n_attractors': [16, 8, 4, 1], 'n_bins': 64, 'name': 'ZoeDepth', 'ngpus_per_node': 0, 'notes': '', 'num_workers': 16, 'output_distribution': 'logbinomial', 'pct_start': 0.7, 'pos_enc_lr_factor': 10, 'prefetch': False, 'pretrained_resource': '', 'print_losses': False, 'project': 'MonoDepth3-nyu', 'random_crop': False, 'random_translate': False, 'rank': 0, 'root': '.', 'same_lr': False, 'save_dir': '/vol/fob-vol3/mi20/deghaisa/shortcuts/monodepth3_checkpoints', 'shared_dict': None, 'tags': '', 'three_phase': False, 'train_midas': True, 'trainer': 'zoedepth', 'translate_prob': 0.2, 'uid': None, 'use_amp': False, 'use_pretrained_midas': True, 'use_shared_dict': False, 'validate_every': 0.25, 'version_name': 'v1', 'w_domain': 0.2, 'w_grad': 0, 'w_reg': 0, 'w_si': 1, 'wd': 0.01, 'workers': 16, 'world_size': 1}

The training does not start afterwards and train_mono.py terminates. Is there any advice to solve this ?

michaeltan53 commented 1 month ago

I also faced the problem of entering the training command but the training did not start, and the whole program terminated automatically after a while. I think it was because my computer graphics card did not have enough computing power. I switched to the server in the laboratory and was able to train the ZoeDepth model.

YacineDeghaies commented 1 month ago

@michaeltan53 you have to go here zoedepth/models/zoedepth/config_zoedepth.json and change batch size to 3