hirotomusiker / CLRerNet

The official implementation of "CLRerNet: Improving Confidence of Lane Detection with LaneIoU"
Apache License 2.0
181 stars 19 forks source link

Testing: Can't open/read file, can't convert np.ndarray of type numpy.object_ #38

Closed tarkanozsen closed 6 months ago

tarkanozsen commented 6 months ago

I am facing the following error during testing, note that I clipped out the majority of the warnings due to them being repetitive. I am attaching the full message in case their contents are necessary. I'd greatly appreciate your help, thanks in advance.

testingerror.txt

docker@0d896633412d:/work$ python tools/train.py configs/clrernet/culane/clrernet_culane_dla34.py /home/docker/.pyenv/versions/3.8.4/lib/python3.8/site-packages/mmcv/init.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details. warnings.warn( 2024-05-20 20:28:26,979 - mmdet - INFO - Environment info:

sys.platform: linux Python: 3.8.4 (default, May 16 2024, 18:17:37) [GCC 11.4.0] CUDA available: True GPU 0: NVIDIA GeForce RTX 3060 Laptop GPU CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 11.8, V11.8.89 GCC: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 PyTorch: 1.12.1+cu116 PyTorch compiling details: PyTorch built with:

TorchVision: 0.13.1+cu102 OpenCV: 4.9.0 MMCV: 1.7.0 MMCV Compiler: GCC 9.3 MMCV CUDA Compiler: 11.6 MMDetection: 2.28.0+2b5632b

2024-05-20 20:28:28,106 - mmdet - INFO - Distributed training: False 2024-05-20 20:28:29,116 - mmdet - INFO - Config: model = dict( type='CLRerNet', backbone=dict(type='DLANet', dla='dla34', pretrained=True), neck=dict( type='CLRerNetFPN', in_channels=[128, 256, 512], out_channels=64, num_outs=3), bbox_head=dict( type='CLRerHead', anchor_generator=dict( type='CLRerNetAnchorGenerator', num_priors=192, num_points=72), img_w=800, img_h=320, prior_feat_channels=64, fc_hidden_dim=64, num_fc=2, refine_layers=3, sample_points=36, attention=dict(type='ROIGather'), loss_cls=dict( type='KorniaFocalLoss', alpha=0.25, gamma=2, loss_weight=2.0), loss_bbox=dict(type='SmoothL1Loss', reduction='none', loss_weight=0.2), loss_iou=dict( type='LaneIoULoss', lane_width=0.009375, loss_weight=4.0), loss_seg=dict( type='CLRNetSegLoss', loss_weight=1.0, num_classes=5, ignore_label=255, bg_weight=0.4)), train_cfg=dict( assigner=dict( type='DynamicTopkAssigner', max_topk=4, min_topk=1, cost_combination=1, cls_cost=dict(type='FocalCost', weight=1.0), reg_cost=dict(type='DistanceCost', weight=0.0), iou_dynamick=dict( type='LaneIoUCost', lane_width=0.009375, use_pred_start_end=False, use_giou=True), iou_cost=dict( type='LaneIoUCost', lane_width=0.0375, use_pred_start_end=True, use_giou=True))), test_cfg=dict( conf_threshold=0.41, use_nms=True, as_lanes=True, nms_thres=50, nms_topk=4, ori_img_w=1640, ori_img_h=590, cut_height=270)) dataset_type = 'CulaneDataset' data_root = 'dataset/culane' crop_bbox = [0, 270, 1640, 590] img_scale = (800, 320) img_norm_cfg = dict( mean=[0.0, 0.0, 0.0], std=[255.0, 255.0, 255.0], to_rgb=False) compose_cfg = dict(bboxes=False, keypoints=True, masks=True) train_al_pipeline = [ dict( type='Compose', params=dict(bboxes=False, keypoints=True, masks=True)), dict(type='Crop', x_min=0, x_max=1640, y_min=270, y_max=590, p=1), dict(type='Resize', height=320, width=800, p=1), dict(type='HorizontalFlip', p=0.5), dict(type='ChannelShuffle', p=0.1), dict( type='RandomBrightnessContrast', brightness_limit=0.04, contrast_limit=0.15, p=0.6), dict( type='HueSaturationValue', hue_shift_limit=(-10, 10), sat_shift_limit=(-10, 10), val_shift_limit=(-10, 10), p=0.7), dict( type='OneOf', transforms=[ dict(type='MotionBlur', blur_limit=5, p=1.0), dict(type='MedianBlur', blur_limit=5, p=1.0) ], p=0.2), dict( type='IAAAffine', scale=(0.8, 1.2), rotate=(-10.0, 10.0), translate_percent=0.1, p=0.7), dict(type='Resize', height=320, width=800, p=1) ] val_al_pipeline = [ dict( type='Compose', params=dict(bboxes=False, keypoints=True, masks=True)), dict(type='Crop', x_min=0, x_max=1640, y_min=270, y_max=590, p=1), dict(type='Resize', height=320, width=800, p=1) ] train_pipeline = [ dict( type='albumentation', pipelines=[ dict( type='Compose', params=dict(bboxes=False, keypoints=True, masks=True)), dict(type='Crop', x_min=0, x_max=1640, y_min=270, y_max=590, p=1), dict(type='Resize', height=320, width=800, p=1), dict(type='HorizontalFlip', p=0.5), dict(type='ChannelShuffle', p=0.1), dict( type='RandomBrightnessContrast', brightness_limit=0.04, contrast_limit=0.15, p=0.6), dict( type='HueSaturationValue', hue_shift_limit=(-10, 10), sat_shift_limit=(-10, 10), val_shift_limit=(-10, 10), p=0.7), dict( type='OneOf', transforms=[ dict(type='MotionBlur', blur_limit=5, p=1.0), dict(type='MedianBlur', blur_limit=5, p=1.0) ], p=0.2), dict( type='IAAAffine', scale=(0.8, 1.2), rotate=(-10.0, 10.0), translate_percent=0.1, p=0.7), dict(type='Resize', height=320, width=800, p=1) ]), dict( type='Normalize', mean=[0.0, 0.0, 0.0], std=[255.0, 255.0, 255.0], to_rgb=False), dict(type='DefaultFormatBundle'), dict( type='CollectCLRNet', keys=['img'], meta_keys=[ 'filename', 'sub_img_name', 'ori_shape', 'img_shape', 'img_norm_cfg', 'ori_shape', 'img_shape', 'gt_points', 'gt_masks', 'lanes' ]) ] val_pipeline = [ dict( type='albumentation', pipelines=[ dict( type='Compose', params=dict(bboxes=False, keypoints=True, masks=True)), dict(type='Crop', x_min=0, x_max=1640, y_min=270, y_max=590, p=1), dict(type='Resize', height=320, width=800, p=1) ]), dict( type='Normalize', mean=[0.0, 0.0, 0.0], std=[255.0, 255.0, 255.0], to_rgb=False), dict(type='DefaultFormatBundle'), dict( type='CollectCLRNet', keys=['img'], meta_keys=[ 'filename', 'sub_img_name', 'ori_shape', 'img_shape', 'img_norm_cfg' ]) ] data = dict( samples_per_gpu=24, workers_per_gpu=8, train=dict( type='CulaneDataset', data_root='dataset/culane', data_list='dataset/culane/list/train_gt.txt', diff_file='dataset/culane/list/train_diffs.npz', diff_thr=15, pipeline=[ dict( type='albumentation', pipelines=[ dict( type='Compose', params=dict(bboxes=False, keypoints=True, masks=True)), dict( type='Crop', x_min=0, x_max=1640, y_min=270, y_max=590, p=1), dict(type='Resize', height=320, width=800, p=1), dict(type='HorizontalFlip', p=0.5), dict(type='ChannelShuffle', p=0.1), dict( type='RandomBrightnessContrast', brightness_limit=0.04, contrast_limit=0.15, p=0.6), dict( type='HueSaturationValue', hue_shift_limit=(-10, 10), sat_shift_limit=(-10, 10), val_shift_limit=(-10, 10), p=0.7), dict( type='OneOf', transforms=[ dict(type='MotionBlur', blur_limit=5, p=1.0), dict(type='MedianBlur', blur_limit=5, p=1.0) ], p=0.2), dict( type='IAAAffine', scale=(0.8, 1.2), rotate=(-10.0, 10.0), translate_percent=0.1, p=0.7), dict(type='Resize', height=320, width=800, p=1) ]), dict( type='Normalize', mean=[0.0, 0.0, 0.0], std=[255.0, 255.0, 255.0], to_rgb=False), dict(type='DefaultFormatBundle'), dict( type='CollectCLRNet', keys=['img'], meta_keys=[ 'filename', 'sub_img_name', 'ori_shape', 'img_shape', 'img_norm_cfg', 'ori_shape', 'img_shape', 'gt_points', 'gt_masks', 'lanes' ]) ], test_mode=False), val=dict( type='CulaneDataset', data_root='dataset/culane', data_list='dataset/culane/list/test.txt', pipeline=[ dict( type='albumentation', pipelines=[ dict( type='Compose', params=dict(bboxes=False, keypoints=True, masks=True)), dict( type='Crop', x_min=0, x_max=1640, y_min=270, y_max=590, p=1), dict(type='Resize', height=320, width=800, p=1) ]), dict( type='Normalize', mean=[0.0, 0.0, 0.0], std=[255.0, 255.0, 255.0], to_rgb=False), dict(type='DefaultFormatBundle'), dict( type='CollectCLRNet', keys=['img'], meta_keys=[ 'filename', 'sub_img_name', 'ori_shape', 'img_shape', 'img_norm_cfg' ]) ], test_mode=True), test=dict( type='CulaneDataset', data_root='dataset/culane', data_list='dataset/culane/list/test.txt', pipeline=[ dict( type='albumentation', pipelines=[ dict( type='Compose', params=dict(bboxes=False, keypoints=True, masks=True)), dict( type='Crop', x_min=0, x_max=1640, y_min=270, y_max=590, p=1), dict(type='Resize', height=320, width=800, p=1) ]), dict( type='Normalize', mean=[0.0, 0.0, 0.0], std=[255.0, 255.0, 255.0], to_rgb=False), dict(type='DefaultFormatBundle'), dict( type='CollectCLRNet', keys=['img'], meta_keys=[ 'filename', 'sub_img_name', 'ori_shape', 'img_shape', 'img_norm_cfg' ]) ], test_mode=True)) checkpoint_config = dict(interval=15) log_config = dict( interval=100, hooks=[ dict(type='TextLoggerHook'), dict(type='TensorboardLoggerHookEpoch') ]) device_ids = '0' dist_params = dict(backend='nccl') log_level = 'INFO' load_from = None resume_from = None workflow = [('train', 1)] evaluation = dict(interval=3, metric='F1') custom_imports = dict( imports=[ 'libs.models', 'libs.datasets', 'libs.core.bbox', 'libs.core.anchor', 'libs.core.hook' ], allow_failed_imports=False) cfg_name = 'clrernet_culane_dla34.py' total_epochs = 15 optimizer = dict(type='AdamW', lr=0.0006) optimizer_config = dict(grad_clip=None) lr_config = dict(policy='CosineAnnealing', min_lr=0.0, by_epoch=False) work_dir = './work_dirs/clrernet_culane_dla34' gpu_ids = range(0, 1)

55698 data are loaded /home/docker/mmdetection/mmdet/utils/compat_config.py:28: UserWarning: config is now expected to have a runner section, please set runner in your config. warnings.warn( 2024-05-20 20:28:32,435 - mmdet - INFO - Automatic scaling of learning rate (LR) has been disabled. 34680 data are loaded 2024-05-20 20:28:32,496 - mmdet - INFO - Start running, host: docker@0d896633412d, work_dir: /work/work_dirs/clrernet_culane_dla34 2024-05-20 20:28:32,497 - mmdet - INFO - Hooks will be executed in the following order: before_run: (VERY_HIGH ) CosineAnnealingLrUpdaterHook (NORMAL ) CheckpointHook (LOW ) EvalHook (VERY_LOW ) TextLoggerHook (VERY_LOW ) TensorboardLoggerHookEpoch

before_train_epoch: (VERY_HIGH ) CosineAnnealingLrUpdaterHook (LOW ) IterTimerHook (LOW ) EvalHook (VERY_LOW ) TextLoggerHook (VERY_LOW ) TensorboardLoggerHookEpoch

before_train_iter: (VERY_HIGH ) CosineAnnealingLrUpdaterHook (LOW ) IterTimerHook (LOW ) EvalHook

after_train_iter: (ABOVE_NORMAL) OptimizerHook (NORMAL ) CheckpointHook (LOW ) IterTimerHook (LOW ) EvalHook (VERY_LOW ) TextLoggerHook (VERY_LOW ) TensorboardLoggerHookEpoch

after_train_epoch: (NORMAL ) CheckpointHook (LOW ) EvalHook (VERY_LOW ) TextLoggerHook (VERY_LOW ) TensorboardLoggerHookEpoch

before_val_epoch: (LOW ) IterTimerHook (VERY_LOW ) TextLoggerHook (VERY_LOW ) TensorboardLoggerHookEpoch

before_val_iter: (LOW ) IterTimerHook

after_val_iter: (LOW ) IterTimerHook

after_val_epoch: (VERY_LOW ) TextLoggerHook (VERY_LOW ) TensorboardLoggerHookEpoch

after_run: (VERY_LOW ) TextLoggerHook (VERY_LOW ) TensorboardLoggerHookEpoch

2024-05-20 20:28:32,498 - mmdet - INFO - workflow: [('train', 1)], max: 15 epochs 2024-05-20 20:28:32,499 - mmdet - INFO - Checkpoints will be saved to /work/work_dirs/clrernet_culane_dla34 by HardDiskBackend.

...

[ WARN:0@17.413] global loadsave.cpp:248 findDecoder imread_('dataset/culane/laneseg_label_w16/driver_161_90frame/060308250757.MP4/05040.png'): can't open/read file: check file path/integrity [ WARN:0@17.462] global loadsave.cpp:248 findDecoder imread('dataset/culane/laneseg_label_w16/driver_23_30frame/051614210580.MP4/04725.png'): can't open/read file: check file path/integrity [ WARN:0@17.505] global loadsave.cpp:248 findDecoder imread('dataset/culane/laneseg_label_w16/driver_23_30frame/051618170655.MP4/01815.png'): can't open/read file: check file path/integrity [ WARN:0@17.560] global loadsave.cpp:248 findDecoder imread('dataset/culane/laneseg_label_w16/driver_23_30frame/051617290639.MP4/04005.png'): can't open/read file: check file path/integrity [ WARN:0@17.615] global loadsave.cpp:248 findDecoder imread('dataset/culane/laneseg_label_w16/driver_23_30frame/051612590557.MP4/02685.png'): can't open/read file: check file path/integrity [ WARN:0@17.670] global loadsave.cpp:248 findDecoder imread('dataset/culane/laneseg_label_w16/driver_23_30frame/051615580609.MP4/02275.png'): can't open/read file: check file path/integrity [ WARN:0@17.735] global loadsave.cpp:248 findDecoder imread('dataset/culane/laneseg_label_w16/driver_182_30frame/060106010052.MP4/03060.png'): can't open/read file: check file path/integrity [ WARN:0@17.789] global loadsave.cpp:248 findDecoder imread('dataset/culane/laneseg_label_w16/driver_161_90frame/06032338_0992.MP4/04050.png'): can't open/read file: check file path/integrity Traceback (most recent call last): File "tools/train.py", line 203, in main() File "tools/train.py", line 191, in main train_detector( File "/home/docker/mmdetection/mmdet/apis/train.py", line 246, in train_detector runner.run(data_loaders, cfg.workflow) File "/home/docker/.pyenv/versions/3.8.4/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 136, in run epoch_runner(data_loaders[i], kwargs) File "/home/docker/.pyenv/versions/3.8.4/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 53, in train self.run_iter(data_batch, train_mode=True, kwargs) File "/home/docker/.pyenv/versions/3.8.4/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 31, in run_iter outputs = self.model.train_step(data_batch, self.optimizer, File "/home/docker/.pyenv/versions/3.8.4/lib/python3.8/site-packages/mmcv/parallel/data_parallel.py", line 77, in train_step return self.module.train_step(inputs[0], kwargs[0]) File "/home/docker/mmdetection/mmdet/models/detectors/base.py", line 248, in train_step losses = self(data) File "/home/docker/.pyenv/versions/3.8.4/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, *kwargs) File "/home/docker/.pyenv/versions/3.8.4/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 119, in new_func return old_func(args, kwargs) File "/home/docker/mmdetection/mmdet/models/detectors/base.py", line 172, in forward return self.forward_train(img, img_metas, kwargs) File "/work/libs/models/detectors/clrernet.py", line 37, in forward_train losses = self.bbox_head.forward_train(x, img_metas) File "/work/libs/models/dense_heads/clrernet_head.py", line 369, in forward_train losses = self.loss(out_dict, img_metas) File "/work/libs/models/dense_heads/clrernet_head.py", line 350, in loss tgt_masks = torch.tensor(tgtmasks).long().to(device) # (B, H, W) TypeError: can't convert np.ndarray of type numpy.object. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool. docker@0d896633412d:/work$ sudo python tools/train.py configs/clrernet/culane/clrernet_culane_dla34.py sudo: python: command not found docker@0d896633412d:/work$

hirotomusiker commented 6 months ago

Thank you, the lane segmentation mask data are necessary for training. Could you download the following file from CULane and extract it in your culane folder? laneseg_label_w16.tar.gz

We will add instruction for the file in our DATASET.md page later.

tarkanozsen commented 6 months ago

I appreciate it, the problem seemed to have been temporarily resolved but training sessions suddenly started getting killed out of nowhere. The first training went on for ~6 hours the first time before getting killed and the rest got killed almost immediately.

docker@0d896633412d:/work$ python tools/train.py configs/clrernet/culane/clrernet_culane_dla34.py /home/docker/.pyenv/versions/3.8.4/lib/python3.8/site-packages/mmcv/init.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details. warnings.warn( 2024-05-20 23:54:44,497 - mmdet - INFO - Environment info:

sys.platform: linux Python: 3.8.4 (default, May 16 2024, 18:17:37) [GCC 11.4.0] CUDA available: True GPU 0: NVIDIA GeForce RTX 3060 Laptop GPU CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 11.8, V11.8.89 GCC: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 PyTorch: 1.12.1+cu116 PyTorch compiling details: PyTorch built with:

TorchVision: 0.13.1+cu102 OpenCV: 4.9.0 MMCV: 1.7.0 MMCV Compiler: GCC 9.3 MMCV CUDA Compiler: 11.6 MMDetection: 2.28.0+2b5632b

2024-05-20 23:54:45,466 - mmdet - INFO - Distributed training: False 2024-05-20 23:54:46,439 - mmdet - INFO - Config: model = dict( type='CLRerNet', backbone=dict(type='DLANet', dla='dla34', pretrained=True), neck=dict( type='CLRerNetFPN', in_channels=[128, 256, 512], out_channels=64, num_outs=3), bbox_head=dict( type='CLRerHead', anchor_generator=dict( type='CLRerNetAnchorGenerator', num_priors=192, num_points=72), img_w=800, img_h=320, prior_feat_channels=64, fc_hidden_dim=64, num_fc=2, refine_layers=3, sample_points=36, attention=dict(type='ROIGather'), loss_cls=dict( type='KorniaFocalLoss', alpha=0.25, gamma=2, loss_weight=2.0), loss_bbox=dict(type='SmoothL1Loss', reduction='none', loss_weight=0.2), loss_iou=dict( type='LaneIoULoss', lane_width=0.009375, loss_weight=4.0), loss_seg=dict( type='CLRNetSegLoss', loss_weight=1.0, num_classes=5, ignore_label=255, bg_weight=0.4)), train_cfg=dict( assigner=dict( type='DynamicTopkAssigner', max_topk=4, min_topk=1, cost_combination=1, cls_cost=dict(type='FocalCost', weight=1.0), reg_cost=dict(type='DistanceCost', weight=0.0), iou_dynamick=dict( type='LaneIoUCost', lane_width=0.009375, use_pred_start_end=False, use_giou=True), iou_cost=dict( type='LaneIoUCost', lane_width=0.0375, use_pred_start_end=True, use_giou=True))), test_cfg=dict( conf_threshold=0.41, use_nms=True, as_lanes=True, nms_thres=50, nms_topk=4, ori_img_w=1640, ori_img_h=590, cut_height=270)) dataset_type = 'CulaneDataset' data_root = 'dataset/culane' crop_bbox = [0, 270, 1640, 590] img_scale = (800, 320) img_norm_cfg = dict( mean=[0.0, 0.0, 0.0], std=[255.0, 255.0, 255.0], to_rgb=False) compose_cfg = dict(bboxes=False, keypoints=True, masks=True) train_al_pipeline = [ dict( type='Compose', params=dict(bboxes=False, keypoints=True, masks=True)), dict(type='Crop', x_min=0, x_max=1640, y_min=270, y_max=590, p=1), dict(type='Resize', height=320, width=800, p=1), dict(type='HorizontalFlip', p=0.5), dict(type='ChannelShuffle', p=0.1), dict( type='RandomBrightnessContrast', brightness_limit=0.04, contrast_limit=0.15, p=0.6), dict( type='HueSaturationValue', hue_shift_limit=(-10, 10), sat_shift_limit=(-10, 10), val_shift_limit=(-10, 10), p=0.7), dict( type='OneOf', transforms=[ dict(type='MotionBlur', blur_limit=5, p=1.0), dict(type='MedianBlur', blur_limit=5, p=1.0) ], p=0.2), dict( type='IAAAffine', scale=(0.8, 1.2), rotate=(-10.0, 10.0), translate_percent=0.1, p=0.7), dict(type='Resize', height=320, width=800, p=1) ] val_al_pipeline = [ dict( type='Compose', params=dict(bboxes=False, keypoints=True, masks=True)), dict(type='Crop', x_min=0, x_max=1640, y_min=270, y_max=590, p=1), dict(type='Resize', height=320, width=800, p=1) ] train_pipeline = [ dict( type='albumentation', pipelines=[ dict( type='Compose', params=dict(bboxes=False, keypoints=True, masks=True)), dict(type='Crop', x_min=0, x_max=1640, y_min=270, y_max=590, p=1), dict(type='Resize', height=320, width=800, p=1), dict(type='HorizontalFlip', p=0.5), dict(type='ChannelShuffle', p=0.1), dict( type='RandomBrightnessContrast', brightness_limit=0.04, contrast_limit=0.15, p=0.6), dict( type='HueSaturationValue', hue_shift_limit=(-10, 10), sat_shift_limit=(-10, 10), val_shift_limit=(-10, 10), p=0.7), dict( type='OneOf', transforms=[ dict(type='MotionBlur', blur_limit=5, p=1.0), dict(type='MedianBlur', blur_limit=5, p=1.0) ], p=0.2), dict( type='IAAAffine', scale=(0.8, 1.2), rotate=(-10.0, 10.0), translate_percent=0.1, p=0.7), dict(type='Resize', height=320, width=800, p=1) ]), dict( type='Normalize', mean=[0.0, 0.0, 0.0], std=[255.0, 255.0, 255.0], to_rgb=False), dict(type='DefaultFormatBundle'), dict( type='CollectCLRNet', keys=['img'], meta_keys=[ 'filename', 'sub_img_name', 'ori_shape', 'img_shape', 'img_norm_cfg', 'ori_shape', 'img_shape', 'gt_points', 'gt_masks', 'lanes' ]) ] val_pipeline = [ dict( type='albumentation', pipelines=[ dict( type='Compose', params=dict(bboxes=False, keypoints=True, masks=True)), dict(type='Crop', x_min=0, x_max=1640, y_min=270, y_max=590, p=1), dict(type='Resize', height=320, width=800, p=1) ]), dict( type='Normalize', mean=[0.0, 0.0, 0.0], std=[255.0, 255.0, 255.0], to_rgb=False), dict(type='DefaultFormatBundle'), dict( type='CollectCLRNet', keys=['img'], meta_keys=[ 'filename', 'sub_img_name', 'ori_shape', 'img_shape', 'img_norm_cfg' ]) ] data = dict( samples_per_gpu=24, workers_per_gpu=8, train=dict( type='CulaneDataset', data_root='dataset/culane', data_list='dataset/culane/list/train_gt.txt', diff_file='dataset/culane/list/train_diffs.npz', diff_thr=15, pipeline=[ dict( type='albumentation', pipelines=[ dict( type='Compose', params=dict(bboxes=False, keypoints=True, masks=True)), dict( type='Crop', x_min=0, x_max=1640, y_min=270, y_max=590, p=1), dict(type='Resize', height=320, width=800, p=1), dict(type='HorizontalFlip', p=0.5), dict(type='ChannelShuffle', p=0.1), dict( type='RandomBrightnessContrast', brightness_limit=0.04, contrast_limit=0.15, p=0.6), dict( type='HueSaturationValue', hue_shift_limit=(-10, 10), sat_shift_limit=(-10, 10), val_shift_limit=(-10, 10), p=0.7), dict( type='OneOf', transforms=[ dict(type='MotionBlur', blur_limit=5, p=1.0), dict(type='MedianBlur', blur_limit=5, p=1.0) ], p=0.2), dict( type='IAAAffine', scale=(0.8, 1.2), rotate=(-10.0, 10.0), translate_percent=0.1, p=0.7), dict(type='Resize', height=320, width=800, p=1) ]), dict( type='Normalize', mean=[0.0, 0.0, 0.0], std=[255.0, 255.0, 255.0], to_rgb=False), dict(type='DefaultFormatBundle'), dict( type='CollectCLRNet', keys=['img'], meta_keys=[ 'filename', 'sub_img_name', 'ori_shape', 'img_shape', 'img_norm_cfg', 'ori_shape', 'img_shape', 'gt_points', 'gt_masks', 'lanes' ]) ], test_mode=False), val=dict( type='CulaneDataset', data_root='dataset/culane', data_list='dataset/culane/list/test.txt', pipeline=[ dict( type='albumentation', pipelines=[ dict( type='Compose', params=dict(bboxes=False, keypoints=True, masks=True)), dict( type='Crop', x_min=0, x_max=1640, y_min=270, y_max=590, p=1), dict(type='Resize', height=320, width=800, p=1) ]), dict( type='Normalize', mean=[0.0, 0.0, 0.0], std=[255.0, 255.0, 255.0], to_rgb=False), dict(type='DefaultFormatBundle'), dict( type='CollectCLRNet', keys=['img'], meta_keys=[ 'filename', 'sub_img_name', 'ori_shape', 'img_shape', 'img_norm_cfg' ]) ], test_mode=True), test=dict( type='CulaneDataset', data_root='dataset/culane', data_list='dataset/culane/list/test.txt', pipeline=[ dict( type='albumentation', pipelines=[ dict( type='Compose', params=dict(bboxes=False, keypoints=True, masks=True)), dict( type='Crop', x_min=0, x_max=1640, y_min=270, y_max=590, p=1), dict(type='Resize', height=320, width=800, p=1) ]), dict( type='Normalize', mean=[0.0, 0.0, 0.0], std=[255.0, 255.0, 255.0], to_rgb=False), dict(type='DefaultFormatBundle'), dict( type='CollectCLRNet', keys=['img'], meta_keys=[ 'filename', 'sub_img_name', 'ori_shape', 'img_shape', 'img_norm_cfg' ]) ], test_mode=True)) checkpoint_config = dict(interval=15) log_config = dict( interval=100, hooks=[ dict(type='TextLoggerHook'), dict(type='TensorboardLoggerHookEpoch') ]) device_ids = '0' dist_params = dict(backend='nccl') log_level = 'INFO' load_from = None resume_from = None workflow = [('train', 1)] evaluation = dict(interval=3, metric='F1') custom_imports = dict( imports=[ 'libs.models', 'libs.datasets', 'libs.core.bbox', 'libs.core.anchor', 'libs.core.hook' ], allow_failed_imports=False) cfg_name = 'clrernet_culane_dla34.py' total_epochs = 15 optimizer = dict(type='AdamW', lr=0.0006) optimizer_config = dict(grad_clip=None) lr_config = dict(policy='CosineAnnealing', min_lr=0.0, by_epoch=False) work_dir = './work_dirs/clrernet_culane_dla34' gpu_ids = range(0, 1)

55698 data are loaded /home/docker/mmdetection/mmdet/utils/compat_config.py:28: UserWarning: config is now expected to have a runner section, please set runner in your config. warnings.warn( 2024-05-20 23:54:49,697 - mmdet - INFO - Automatic scaling of learning rate (LR) has been disabled. 34680 data are loaded 2024-05-20 23:54:49,759 - mmdet - INFO - Start running, host: docker@0d896633412d, work_dir: /work/work_dirs/clrernet_culane_dla34 2024-05-20 23:54:49,760 - mmdet - INFO - Hooks will be executed in the following order: before_run: (VERY_HIGH ) CosineAnnealingLrUpdaterHook (NORMAL ) CheckpointHook (LOW ) EvalHook (VERY_LOW ) TextLoggerHook (VERY_LOW ) TensorboardLoggerHookEpoch

before_train_epoch: (VERY_HIGH ) CosineAnnealingLrUpdaterHook (LOW ) IterTimerHook (LOW ) EvalHook (VERY_LOW ) TextLoggerHook (VERY_LOW ) TensorboardLoggerHookEpoch

before_train_iter: (VERY_HIGH ) CosineAnnealingLrUpdaterHook (LOW ) IterTimerHook (LOW ) EvalHook

after_train_iter: (ABOVE_NORMAL) OptimizerHook (NORMAL ) CheckpointHook (LOW ) IterTimerHook (LOW ) EvalHook (VERY_LOW ) TextLoggerHook (VERY_LOW ) TensorboardLoggerHookEpoch

after_train_epoch: (NORMAL ) CheckpointHook (LOW ) EvalHook (VERY_LOW ) TextLoggerHook (VERY_LOW ) TensorboardLoggerHookEpoch

before_val_epoch: (LOW ) IterTimerHook (VERY_LOW ) TextLoggerHook (VERY_LOW ) TensorboardLoggerHookEpoch

before_val_iter: (LOW ) IterTimerHook

after_val_iter: (LOW ) IterTimerHook

after_val_epoch: (VERY_LOW ) TextLoggerHook (VERY_LOW ) TensorboardLoggerHookEpoch

after_run: (VERY_LOW ) TextLoggerHook (VERY_LOW ) TensorboardLoggerHookEpoch

2024-05-20 23:54:49,761 - mmdet - INFO - workflow: [('train', 1)], max: 15 epochs 2024-05-20 23:54:49,762 - mmdet - INFO - Checkpoints will be saved to /work/work_dirs/clrernet_culane_dla34 by HardDiskBackend. 2024-05-21 00:31:52,421 - mmdet - INFO - Epoch [1][100/2321] lr: 6.000e-04, eta: 8 days, 22:18:56, time: 22.225, data_time: 0.106, memory: 7316, loss_cls: 1.4044, loss_reg_xytl: 6.2046, loss_iou: 3.5071, loss_seg: 0.7584, loss: 11.8745 2024-05-21 01:09:04,275 - mmdet - INFO - Epoch [1][200/2321] lr: 6.000e-04, eta: 8 days, 22:08:57, time: 22.319, data_time: 0.047, memory: 7316, loss_cls: 0.6360, loss_reg_xytl: 1.5719, loss_iou: 2.2164, loss_seg: 0.5510, loss: 4.9754 2024-05-21 01:46:04,979 - mmdet - INFO - Epoch [1][300/2321] lr: 5.999e-04, eta: 8 days, 21:19:25, time: 22.207, data_time: 0.045, memory: 7316, loss_cls: 0.5779, loss_reg_xytl: 1.1578, loss_iou: 1.7554, loss_seg: 0.4824, loss: 3.9735 2024-05-21 02:23:02,096 - mmdet - INFO - Epoch [1][400/2321] lr: 5.998e-04, eta: 8 days, 20:31:00, time: 22.171, data_time: 0.063, memory: 7316, loss_cls: 0.5874, loss_reg_xytl: 0.9828, loss_iou: 1.4991, loss_seg: 0.4527, loss: 3.5221 2024-05-21 03:00:02,986 - mmdet - INFO - Epoch [1][500/2321] lr: 5.997e-04, eta: 8 days, 19:51:29, time: 22.209, data_time: 0.065, memory: 7316, loss_cls: 0.5648, loss_reg_xytl: 0.6678, loss_iou: 1.2848, loss_seg: 0.4042, loss: 2.9216 2024-05-21 03:36:55,092 - mmdet - INFO - Epoch [1][600/2321] lr: 5.996e-04, eta: 8 days, 19:04:27, time: 22.121, data_time: 0.045, memory: 7316, loss_cls: 0.5580, loss_reg_xytl: 0.6002, loss_iou: 1.1898, loss_seg: 0.3924, loss: 2.7404 2024-05-21 04:13:49,209 - mmdet - INFO - Epoch [1][700/2321] lr: 5.994e-04, eta: 8 days, 18:21:58, time: 22.141, data_time: 0.069, memory: 7316, loss_cls: 0.5544, loss_reg_xytl: 0.5401, loss_iou: 1.1163, loss_seg: 0.3682, loss: 2.5790 2024-05-21 04:50:47,832 - mmdet - INFO - Epoch [1][800/2321] lr: 5.992e-04, eta: 8 days, 17:44:04, time: 22.186, data_time: 0.044, memory: 7316, loss_cls: 0.5442, loss_reg_xytl: 0.5219, loss_iou: 1.0569, loss_seg: 0.3441, loss: 2.4671 2024-05-21 05:27:58,665 - mmdet - INFO - Epoch [1][900/2321] lr: 5.990e-04, eta: 8 days, 17:14:01, time: 22.308, data_time: 0.054, memory: 7316, loss_cls: 0.5317, loss_reg_xytl: 0.4912, loss_iou: 0.9891, loss_seg: 0.3369, loss: 2.3489 Killed

The rest got killed after this line: 2024-05-21 06:56:45,087 - mmdet - INFO - Checkpoints will be saved to /work/work_dirs/clrernet_culane_dla34 by HardDiskBackend. Killed

hirotomusiker commented 6 months ago

It would be a CPU memory issue. Please check memory usage and reduce unnecessary processes to increase available memory.

tarkanozsen commented 6 months ago

I appreciate the help.