open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark
https://mmdetection.readthedocs.io
Apache License 2.0
29.67k stars 9.48k forks source link

在跑yoloX模型时,随着epoch增加,显存占用也在增加,请问这是正常的吗,还是说增加到一定程度才会稳定? #6294

Closed Breeze-Zero closed 3 years ago

Breeze-Zero commented 3 years ago

2021-10-16 04:08:08,409 - mmdet - INFO - workflow: [('train', 1)], max: 300 epochs 2021-10-16 04:09:08,629 - mmdet - INFO - Epoch [1][50/167] lr: 8.964e-06, eta: 16:44:02, time: 1.204, data_time: 0.255, memory: 4057, loss_cls: 1.1663, loss_bbox: 4.7203, loss_obj: 9.1746, loss: 15.0613 2021-10-16 04:09:59,332 - mmdet - INFO - Epoch [1][100/167] lr: 3.586e-05, eta: 15:24:02, time: 1.014, data_time: 0.130, memory: 4057, loss_cls: 1.2196, loss_bbox: 4.6813, loss_obj: 8.6757, loss: 14.5766 2021-10-16 04:10:48,322 - mmdet - INFO - Epoch [1][150/167] lr: 8.068e-05, eta: 14:47:18, time: 0.980, data_time: 0.113, memory: 4057, loss_cls: 1.2788, loss_bbox: 4.6396, loss_obj: 8.3994, loss: 14.3178 2021-10-16 04:12:15,034 - mmdet - INFO - Epoch [2][50/167] lr: 1.688e-04, eta: 14:58:52, time: 1.495, data_time: 0.433, memory: 7559, loss_cls: 1.4298, loss_bbox: 4.5041, loss_obj: 9.5837, loss: 15.5176 2021-10-16 04:13:18,022 - mmdet - INFO - Epoch [2][100/167] lr: 2.556e-04, eta: 15:25:44, time: 1.260, data_time: 0.234, memory: 7559, loss_cls: 1.5678, loss_bbox: 4.3766, loss_obj: 7.4383, loss: 13.3827 2021-10-16 04:14:25,028 - mmdet - INFO - Epoch [2][150/167] lr: 3.603e-04, eta: 15:54:19, time: 1.340, data_time: 0.343, memory: 7559, loss_cls: 1.7157, loss_bbox: 4.2518, loss_obj: 6.7057, loss: 12.6732 2021-10-16 04:16:02,032 - mmdet - INFO - Epoch [3][50/167] lr: 5.287e-04, eta: 16:00:58, time: 1.615, data_time: 0.558, memory: 14622, loss_cls: 1.8487, loss_bbox: 4.0784, loss_obj: 7.3269, loss: 13.2540 2021-10-16 04:17:10,319 - mmdet - INFO - Epoch [3][100/167] lr: 6.754e-04, eta: 16:19:39, time: 1.366, data_time: 0.351, memory: 14629, loss_cls: 1.8960, loss_bbox: 4.0136, loss_obj: 6.8301, loss: 12.7397 2021-10-16 04:18:18,131 - mmdet - INFO - Epoch [3][150/167] lr: 8.400e-04, eta: 16:33:25, time: 1.356, data_time: 0.314, memory: 14629, loss_cls: 1.9422, loss_bbox: 3.9494, loss_obj: 6.7869, loss: 12.6786 2021-10-16 04:19:54,423 - mmdet - INFO - Epoch [4][50/167] lr: 1.089e-03, eta: 16:27:05, time: 1.543, data_time: 0.510, memory: 14629, loss_cls: 1.9167, loss_bbox: 3.9542, loss_obj: 6.3084, loss: 12.1793 2021-10-16 04:20:59,832 - mmdet - INFO - Epoch [4][100/167] lr: 1.295e-03, eta: 16:33:50, time: 1.308, data_time: 0.289, memory: 14629, loss_cls: 1.8949, loss_bbox: 3.8927, loss_obj: 6.3064, loss: 12.0941 2021-10-16 04:22:05,231 - mmdet - INFO - Epoch [4][150/167] lr: 1.520e-03, eta: 16:39:22, time: 1.308, data_time: 0.299, memory: 14629, loss_cls: 1.9157, loss_bbox: 3.8816, loss_obj: 6.3355, loss: 12.1328 2021-10-16 04:23:41,933 - mmdet - INFO - Epoch [5][50/167] lr: 1.848e-03, eta: 16:36:28, time: 1.598, data_time: 0.634, memory: 14629, loss_cls: 1.9456, loss_bbox: 3.8068, loss_obj: 6.6211, loss: 12.3736 2021-10-16 04:24:47,233 - mmdet - INFO - Epoch [5][100/167] lr: 2.115e-03, eta: 16:40:33, time: 1.306, data_time: 0.312, memory: 14629, loss_cls: 1.9315, loss_bbox: 3.7877, loss_obj: 6.5263, loss: 12.2455 2021-10-16 04:25:55,635 - mmdet - INFO - Epoch [5][150/167] lr: 2.399e-03, eta: 16:47:08, time: 1.368, data_time: 0.412, memory: 14629, loss_cls: 1.9678, loss_bbox: 3.7382, loss_obj: 6.5590, loss: 12.2650 2021-10-16 04:27:34,733 - mmdet - INFO - Epoch [6][50/167] lr: 2.500e-03, eta: 16:46:06, time: 1.650, data_time: 0.497, memory: 14629, loss_cls: 1.9661, loss_bbox: 3.7307, loss_obj: 6.3497, loss: 12.0465 2021-10-16 04:28:43,656 - mmdet - INFO - Epoch [6][100/167] lr: 2.500e-03, eta: 16:51:44, time: 1.378, data_time: 0.286, memory: 14629, loss_cls: 1.9591, loss_bbox: 3.6616, loss_obj: 6.3190, loss: 11.9397 2021-10-16 04:29:47,221 - mmdet - INFO - Epoch [6][150/167] lr: 2.500e-03, eta: 16:52:09, time: 1.270, data_time: 0.193, memory: 14629, loss_cls: 1.9598, loss_bbox: 3.6410, loss_obj: 6.3279, loss: 11.9287 2021-10-16 04:31:26,832 - mmdet - INFO - Epoch [7][50/167] lr: 2.500e-03, eta: 16:50:13, time: 1.643, data_time: 0.569, memory: 14629, loss_cls: 2.0108, loss_bbox: 3.5723, loss_obj: 6.4863, loss: 12.0695 2021-10-16 04:32:35,733 - mmdet - INFO - Epoch [7][100/167] lr: 2.500e-03, eta: 16:54:28, time: 1.378, data_time: 0.357, memory: 14629, loss_cls: 2.0065, loss_bbox: 3.5490, loss_obj: 6.3861, loss: 11.9416 2021-10-16 04:33:41,320 - mmdet - INFO - Epoch [7][150/167] lr: 2.500e-03, eta: 16:55:53, time: 1.312, data_time: 0.249, memory: 14629, loss_cls: 1.9737, loss_bbox: 3.5437, loss_obj: 6.3433, loss: 11.8606 2021-10-16 04:35:21,624 - mmdet - INFO - Epoch [8][50/167] lr: 2.500e-03, eta: 16:51:37, time: 1.582, data_time: 0.570, memory: 14629, loss_cls: 1.9461, loss_bbox: 3.5545, loss_obj: 6.1208, loss: 11.6214 2021-10-16 04:36:26,831 - mmdet - INFO - Epoch [8][100/167] lr: 2.499e-03, eta: 16:52:38, time: 1.306, data_time: 0.343, memory: 14629, loss_cls: 1.9597, loss_bbox: 3.5297, loss_obj: 6.0977, loss: 11.5870 2021-10-16 04:37:33,030 - mmdet - INFO - Epoch [8][150/167] lr: 2.499e-03, eta: 16:54:03, time: 1.324, data_time: 0.322, memory: 14629, loss_cls: 1.9800, loss_bbox: 3.4863, loss_obj: 6.0436, loss: 11.5098 2021-10-16 04:39:18,131 - mmdet - INFO - Epoch [9][50/167] lr: 2.499e-03, eta: 16:54:00, time: 1.717, data_time: 0.459, memory: 19189, loss_cls: 1.9979, loss_bbox: 3.4214, loss_obj: 6.2494, loss: 11.6687 2021-10-16 04:40:28,330 - mmdet - INFO - Epoch [9][100/167] lr: 2.499e-03, eta: 16:57:20, time: 1.404, data_time: 0.188, memory: 19189, loss_cls: 1.9667, loss_bbox: 3.4510, loss_obj: 6.1090, loss: 11.5267 2021-10-16 04:41:38,932 - mmdet - INFO - Epoch [9][150/167] lr: 2.499e-03, eta: 17:00:36, time: 1.412, data_time: 0.171, memory: 19193, loss_cls: 1.9661, loss_bbox: 3.4188, loss_obj: 6.0944, loss: 11.4793 2021-10-16 04:43:21,234 - mmdet - INFO - Epoch [10][50/167] lr: 2.499e-03, eta: 16:57:38, time: 1.628, data_time: 0.623, memory: 19193, loss_cls: 1.9278, loss_bbox: 3.4606, loss_obj: 5.9001, loss: 11.2886 2021-10-16 04:44:29,032 - mmdet - INFO - Epoch [10][100/167] lr: 2.498e-03, eta: 16:59:03, time: 1.356, data_time: 0.384, memory: 19193, loss_cls: 1.9004, loss_bbox: 3.4508, loss_obj: 5.7690, loss: 11.1201 2021-10-16 04:45:35,032 - mmdet - INFO - Epoch [10][150/167] lr: 2.498e-03, eta: 16:59:27, time: 1.320, data_time: 0.305, memory: 19193, loss_cls: 1.9018, loss_bbox: 3.4413, loss_obj: 5.8201, loss: 11.1632 2021-10-16 04:45:54,570 - mmdet - INFO - Saving checkpoint at 10 epochs

RangiLyu commented 3 years ago

Yes, it is normal. When using SyncRandomSizeHook in YOLOX, the input size will be randomly set in each epoch. Because of the change of input size, the memory cost will be different. You can refer to our training log: https://download.openmmlab.com/mmdetection/v2.0/yolox/yolox_tiny_8x8_300e_coco/yolox_tiny_8x8_300e_coco_20210806_234250.log.json

lijoe123 commented 3 years ago

Hello! Do you konw whats the 'time' and 'data_time' mean?

zhaoxin111 commented 3 years ago

Hello! Do you konw whats the 'time' and 'data_time' mean?

time: the network forward time+data time data_time: the data load time

lijoe123 commented 3 years ago

我跑了一下yolox发现我也是这样,但是其他的模型不会出现这种情况,可能是yolox这个模型就是这样的吧。发自坚果 R2

Hello! Do you konw whats the 'time' and 'data_time' mean?

time: the network forward time data time data_time: the data load time

—You are receiving this because you commented.Reply to this email directly, view it on GitHub, or unsubscribe.Triage notifications on the go with GitHub Mobile for iOS or Android.

[ { @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "https://github.com/open-mmlab/mmdetection/issues/6294#issuecomment-950019189", "url": "https://github.com/open-mmlab/mmdetection/issues/6294#issuecomment-950019189", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.": "Organization", "name": "GitHub", "url": "https://github.com" } } ]