Closed chenzyhust closed 1 year ago
您好,请问您跑通了这个代码吗?我在进行数据集处理的时候运行gen_seg_gt_from_lidarseg.py这个文件,会显示缺少”occ_infos_temporal_val.pkl“,我在github库里也没有找到这个pkl的生成文件,不知道怎么解决,完成不了这部分gt的生成
请问有测过12epoch 训练的耗时吗?
您好,这是在我设备上的实验情况:
16xA100约18小时,8xA100约28小时,8x3090约36小时 (总batch_size=16,workers_per_gpu=2)
请问有测过12epoch 训练的耗时吗?
您好,这是在我设备上的实验情况:
16xA100约18小时,8xA100约28小时,8x3090约36小时 (总batch_size=16,workers_per_gpu=2)
请问有测过12epoch 训练的耗时吗?
您好,这是在我设备上的实验情况:
16xA100约18小时,8xA100约28小时,8x3090约36小时 (总batch_size=16,workers_per_gpu=2)
最后发现cpu的配置对训练速度影响比较大,请问您这边设备cpu配置参数是多少?还有就是提供的log中训练的memory和速度都与现在的repo有点差距
请问有测过12epoch 训练的耗时吗?
您好,这是在我设备上的实验情况: 16xA100约18小时,8xA100约28小时,8x3090约36小时 (总batch_size=16,workers_per_gpu=2)
请问有测过12epoch 训练的耗时吗?
您好,这是在我设备上的实验情况: 16xA100约18小时,8xA100约28小时,8x3090约36小时 (总batch_size=16,workers_per_gpu=2)
最后发现cpu的配置对训练速度影响比较大,请问您这边设备cpu配置参数是多少?还有就是提供的log中训练的memory和速度都与现在的repo有点差距
请问下cpu配置参数是指什么? 提供的log是用16张A100训的(sample_per_gpu对应的改为了1)
请问有测过12epoch 训练的耗时吗?
您好,这是在我设备上的实验情况: 16xA100约18小时,8xA100约28小时,8x3090约36小时 (总batch_size=16,workers_per_gpu=2)
请问有测过12epoch 训练的耗时吗?
您好,这是在我设备上的实验情况: 16xA100约18小时,8xA100约28小时,8x3090约36小时 (总batch_size=16,workers_per_gpu=2)
最后发现cpu的配置对训练速度影响比较大,请问您这边设备cpu配置参数是多少?还有就是提供的log中训练的memory和速度都与现在的repo有点差距
CPU配置: 128核
root@perception: ~# free -h
total used free shared buff/cache available
Mem: 1.0T 20G 958G 403M 28G 981G
Swap: 0B 0B 0B
8xA100训练log片段:
2023-10-20 17:50:43,834 - mmdet - INFO - Epoch [1][50/1759] lr: 2.458e-05, eta: 1 day, 7:05:06, time: 5.314, data_time: 0.225, memory: 18270, loss_render_depth: 3.0824, loss_render_semantic: 2.8571, loss_sdf_entropy: 0.0005, loss_sdf_distortion: 0.0000, loss_lss_depth: 0.2749, loss: 6.2148, grad_norm: 4.5719
2023-10-20 17:54:29,421 - mmdet - INFO - Epoch [1][100/1759] lr: 4.955e-05, eta: 1 day, 4:40:11, time: 4.512, data_time: 0.094, memory: 18270, loss_render_depth: 2.2072, loss_render_semantic: 2.6976, loss_sdf_entropy: 0.0033, loss_sdf_distortion: 0.0000, loss_lss_depth: 0.2657, loss: 5.1738, grad_norm: 9.6986
2023-10-20 17:58:09,742 - mmdet - INFO - Epoch [1][150/1759] lr: 7.453e-05, eta: 1 day, 3:37:03, time: 4.406, data_time: 0.094, memory: 18270, loss_render_depth: 1.3030, loss_render_semantic: 1.7922, loss_sdf_entropy: 0.0037, loss_sdf_distortion: 0.0001, loss_lss_depth: 0.2620, loss: 3.3609, grad_norm: 11.0911
2023-10-20 18:01:48,743 - mmdet - INFO - Epoch [1][200/1759] lr: 9.950e-05, eta: 1 day, 3:01:26, time: 4.381, data_time: 0.096, memory: 18270, loss_render_depth: 0.8335, loss_render_semantic: 1.2445, loss_sdf_entropy: 0.0013, loss_sdf_distortion: 0.0001, loss_lss_depth: 0.2517, loss: 2.3312, grad_norm: 6.0370
2023-10-20 18:05:29,722 - mmdet - INFO - Epoch [1][250/1759] lr: 1.000e-04, eta: 1 day, 2:41:22, time: 4.420, data_time: 0.098, memory: 18270, loss_render_depth: 0.6090, loss_render_semantic: 0.9972, loss_sdf_entropy: 0.0008, loss_sdf_distortion: 0.0002, loss_lss_depth: 0.2357, loss: 1.8429, grad_norm: 4.0712
2023-10-20 18:09:11,086 - mmdet - INFO - Epoch [1][300/1759] lr: 1.000e-04, eta: 1 day, 2:27:08, time: 4.427, data_time: 0.094, memory: 18270, loss_render_depth: 0.5199, loss_render_semantic: 0.8209, loss_sdf_entropy: 0.0007, loss_sdf_distortion: 0.0002, loss_lss_depth: 0.2187, loss: 1.5604, grad_norm: 2.7719
请问有测过12epoch 训练的耗时吗?