train the net - Githubissues

ptxaxx commented 2 years ago

Hi！Thank you so much for sharing！ 1:How many epochs has this been trained for? I didn't find any instructions on the source code，
for epoch in range(config.max_epoch): print('=' 20, epoch, '=' 20) train_step, val_step = 0, 0 for inputs in tqdm(train_dataloader): for k, v in inputs.items(): if isinstance(v, list): for i in range(len(v)): inputs[k][i] = inputs[k][i].cuda() else: inputs[k] = inputs[k].cuda() 2: Why is training so slow? my graphics card is titan

zhulf0804 commented 2 years ago

Hi, thanks for your interest.

Different epochs are set for different datasets. The configuration can be seen in configs/*.yaml . For 3DMatch/3DLoMatch, max_epoch is set 40, which can be seen in https://github.com/zhulf0804/NgeNet/blob/d4917f22e55195132ec6fc602554102d321ce4b5/configs/threedmatch.yaml#L46
The low training speed is due to the correspondences construction operation (shown below) on CPU. For example, we trained 3DMatch on one RTX 3090 card using about 40 hours. https://github.com/zhulf0804/NgeNet/blob/d4917f22e55195132ec6fc602554102d321ce4b5/utils/o3d.py#L75-L88

Best regards.

ptxaxx commented 2 years ago

Thanks for such a quick answer！

ptxaxx commented 2 years ago

Hello, when I evaluate and visualize, the process is killed when it reaches 14%. It may be a memory overflow problem. Is there any good solution?my graphics card is titan

zhulf0804 commented 2 years ago

Hello,

What's your GPU memory size ? I didn't record the runtime memory size before, but GTX 1080Ti is enough for me.

One way your can try is to use torch.cuda.empty_cache() after each iteration.

Best.

ptxaxx commented 2 years ago

Hi, sorry to trouble you again! Because I just started learning this aspect, so I don't understand a lot of things. I still can't solve the problem of killing the process. I don't know how to add this code. torch.cuda.empty_cache() ; I executed this command. Is this not using CUDA? python eval_3dmatch.py --benchmark 3DMatch --data_root your_path/indoor --checkpoint your_path/3dmatch.pth --saved_path work_dirs/3dmatch [--vis] [--no_cuda] If I want to use CUDA for evaluation and visualization, how do I need to modify the code? Hope to get your help, thank you very much！！！

------------------ 原始邮件 ------------------ 发件人: "zhulf0804/NgeNet" @.>; 发送时间: 2022年6月26日(星期天) 中午12:19 @.>; 抄送: "夜 s @.**@.>; 主题: Re: [zhulf0804/NgeNet] train the net (Issue #7)

Hello,

What's your GPU memory size ? I didn't record the runtime memory size before, but GTX 1080Ti is enough for me.

One way your can try is to use torch.cuda.empty_cache() after each iteration.

Best.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

zhulf0804 commented 2 years ago

Hi,

python eval_3dmatch.py --benchmark 3DMatch --data_root your_path/indoor --checkpoint your_path/3dmatch.pth --saved_path work_dirs/3dmatch menas using cuda for evaluation.
python eval_3dmatch.py --benchmark 3DMatch --data_root your_path/indoor --checkpoint your_path/3dmatch.pth --saved_path work_dirs/3dmatch --vis menas using cuda for visualization.
python eval_3dmatch.py --benchmark 3DMatch --data_root your_path/indoor --checkpoint your_path/3dmatch.pth --saved_path work_dirs/3dmatch --vis --no_cuda menas visualizing on cpu.

Besides, you may add torch.cuda.empty_cache() in line 199 as follows: https://github.com/zhulf0804/NgeNet/blob/d4917f22e55195132ec6fc602554102d321ce4b5/eval_3dmatch.py#L194-L198

Best.

zhulf0804 / GCNet

train the net #7