配置Lane detection 的数据集culane的demo遇到的问题 #150

Open CHANdaFeng opened 1 year ago

CHANdaFeng commented 1 year ago

请问以下我在配置Lane detection 的数据集culane的demo中,当训练启动python main_landet.py --test --config=configs/lane_detection/bezierlanenet/resnet34_culane_aug1b.py --checkpoint=resnet34_bezierlanenet_culane_aug1b_20211109.pt 遇到问题 NVIDIA GeForce RTX 3060 with CUDA capability sm_86 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37. If you want to use the NVIDIA GeForce RTX 3060 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/ 报错

我配的环境安装官网配的pytorch1.6 nvidia-smi显示cuda是12.1, nvcc -V,显示没有,请问环境方面有什么问题吗 下面是环境的 (pad) cxf@cxf:~$ conda list

voldemortX commented 1 year ago

@CHANdaFeng 你这个报错是因为使用的pytorch只支持sm75(20系显卡),你可以看看30系最低需要哪个版本。这个和你cuda版本没关系。

CHANdaFeng commented 1 year ago

@voldemortX 好的,非常感谢!我在查查原因!

CHANdaFeng commented 1 year ago

hello,请问一下我现在的环境是 cuda 12.1,torch 1.10.1 ,mmcv-full 1.4.6
在运行python main_landet.py --train --config=configs/lane_detection/baseline/enet_culane.py 时候报错: ImportError: libcudart.so.11.0: cannot open shared object file: No such file or directory 是什么原因呀? 查看确实没有libcudart.so.11.0:, 不应该是调用12.1吗

voldemortX commented 1 year ago

@CHANdaFeng 你安装的mmcv是对应cuda 12.1的版本吗,mmcv不是有个表单对应各个版本

CHANdaFeng commented 1 year ago

@voldemortX 好的,我在查查原因 谢谢!

CHANdaFeng commented 1 year ago

@voldemortX hello~ 请问一下 我运行 python main_landet.py --train --config=configs/lane_detection/bezierlanenet/resnet18_culane_aug1b.py
报错超时RuntimeError: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. 请问一下是什么原因呀? 现在环境是

cuda11.3.0 torch==1.11.0 l mmcv==2.0.0 numpy=1.19

具体报错如下: /home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/torch/cuda/init.py:82: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:112.) No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda-11.3' /home/cxf/pytorch-auto-drive/utils/models/lane_detection/laneatt.py:22: UserWarning: Can't complie line nms op for LaneATT. Set verbose=True for load in /utils/csrc/apis.py L9 for details. main_landet.py:24: UserWarning: Unable to set a high enough file descriptor limit 8192 (your system may has a low hard limit 4096). If you encounter related problems in training, try reduce the number of workers by --workers, or switch into file_system mode at Line 8. Loaded torchvision ImageNet pre-trained weights V1. Not using distributed mode cuda Traceback (most recent call last): File "main_landet.py", line 75, in runner = Runner(cfg=cfg) File "/home/cxf/pytorch-auto-drive/utils/runners/lane_det_trainer.py", line 17, in init super().init(cfg) File "/home/cxf/pytorch-auto-drive/utils/runners/base.py", line 117, in init net_without_ddp, self.device = self.get_device_and_move_model() File "/home/cxf/pytorch-auto-drive/utils/runners/base.py", line 159, in get_device_and_move_model self.model.to(device) File "/home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/torch/nn/modules/module.py", line 907, in to return self._apply(convert) File "/home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/torch/nn/modules/module.py", line 578, in _apply module._apply(fn) File "/home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/torch/nn/modules/module.py", line 578, in _apply module._apply(fn) File "/home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/torch/nn/modules/module.py", line 601, in _apply param_applied = fn(param) File "/home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/torch/nn/modules/module.py", line 905, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) File "/home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/torch/cuda/init.py", line 216, in _lazy_init torch._C._cuda_init() RuntimeError: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero.

voldemortX commented 1 year ago

@CHANdaFeng 看不出来,首先你的nms编译失败很可能就是程序没识别到cuda。应该是你的环境有问题,你可以先试一些简单的脚本能不能跑,逐步定位问题。

CHANdaFeng commented 1 year ago

@voldemortX hello,现在nms编译成功了,当在启动 python main_landet.py --train --config=configs/lane_detection/bezierlanenet/resnet18_culane_aug1b.py 时候报错,如下,请问是什么原因呀,感谢您的耐心解答! Successfully complied line nms for LaneATT. main_landet.py:24: UserWarning: Unable to set a high enough file descriptor limit 8192 (your system may has a low hard limit 4096). If you encounter related problems in training, try reduce the number of workers by --workers, or switch into file_system mode at Line 8. Loaded torchvision ImageNet pre-trained weights V1. Not using distributed mode cuda Build from dict error in function or class: CULaneAsBezier In Python: <class 'utils.datasets.lane_as_bezier.CULaneAsBezier'> Traceback (most recent call last): File "main_landet.py", line 75, in runner = Runner(cfg=cfg) File "/home/cxf/pytorch-auto-drive/utils/runners/lane_det_trainer.py", line 17, in init super().init(cfg) File "/home/cxf/pytorch-auto-drive/utils/runners/base.py", line 127, in init dataset = DATASETS.from_dict(cfg['dataset'], File "/home/cxf/pytorch-auto-drive/utils/registry.py", line 41, in from_dict raise e File "/home/cxf/pytorch-auto-drive/utils/registry.py", line 38, in from_dict return function_or_class(**dictparams) File "/home/cxf/pytorch-auto-drive/utils/datasets/lane_as_bezier.py", line 39, in init self._init_all() File "/home/cxf/pytorch-auto-drive/utils/datasets/lane_as_bezier.py", line 104, in _init_all self.beziers = self.loader_bezier() File "/home/cxf/pytorch-auto-drive/utils/datasets/lane_as_bezier.py", line 72, in loader_bezier with open(self.bezier_labels, 'r') as f: FileNotFoundError: [Errno 2] No such file or directory: '/home/cxf/bag/CULane/bezier_labels/train_3.json'

CHANdaFeng commented 1 year ago

我已经按照CULane Dataset中下载对应的数据集并修改,下载的时候没有看到bezier_labels/train_3.json'文件?

CHANdaFeng commented 1 year ago

@voldemortX 我已经解决这个问题~

CHANdaFeng commented 1 year ago

hello @voldemortX ,我训练CULane数据集 resnet18_culane_aug1b.py 得到pt模型后, 想看一下test预测的效果, 在运行python main_landet.py --test --config=configs/lane_detection/bezierlanenet/resnet18_culane_aug1b.py --mixed-precision 时候遇到 找不到这个图片的错误, 但是数据集对应的路径实际是有这个文件的,请问是什么原因呀 (pad) cxf@cxf:~/pytorch-auto-drive$ python main_landet.py --test --config=configs/lane_detection/bezierlanenet/resnet18_culane_aug1b.py --mixed-precision Successfully complied line nms for LaneATT. main_landet.py:24: UserWarning: Unable to set a high enough file descriptor limit 8192 (your system may has a low hard limit 4096). If you encounter related problems in training, try reduce the number of workers by --workers, or switch into file_system mode at Line 8. Loaded torchvision ImageNet pre-trained weights V1. cuda:0 0%| | 0/34680 [00:00<?, ?it/s] Traceback (most recent call last): File "main_landet.py", line 76, in runner.run() File "/home/cxf/pytorch-auto-drive/utils/runners/lane_det_tester.py", line 31, in run self.test_one_set(self.model, self.device, self.dataloader, self._cfg['mixed_precision'], File "/home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/home/cxf/pytorch-auto-drive/utils/runners/lane_det_tester.py", line 47, in test_one_set for images, filenames in tqdm(loader): File "/home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/tqdm/std.py", line 1180, in iter for obj in iterable: File "/home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 530, in next data = self._next_data() File "/home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 570, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/cxf/pytorch-auto-drive/utils/datasets/lane_as_bezier.py", line 48, in getitem img = Image.open(self.images[index]).convert('RGB') File "/home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/PIL/Image.py", line 3227, in open fp = builtins.open(filename, "rb") FileNotFoundError: [Errno 2] No such file or directory: '/home/cxf/bag/CULane/river_100_30frame/05251517_0433.MP4/00000.jpg'

voldemortX commented 1 year ago

@CHANdaFeng '/home/cxf/bag/CULane/river_100_30frame/05251517_0433.MP4/00000.jpg'这个文件是存在的?

CHANdaFeng commented 1 year ago

@voldemortX 好像是不存在的,我把driver_100_30frame 文件夹名称改成对应的river_100_30frame,好像可以了

voldemortX commented 1 year ago

CHANdaFeng commented 1 year ago

@voldemortX 好的, 不过有点疑问,这个数据集是从官网下载的CULane,我修改之后,现在运行 python main_landet.py --test --config=configs/lane_detection/bezierlanenet/resnet18_culane_aug1b.py --checkpoint=checkpoints/resnet18_bezierlanenet_culane-aug2/model.pt 运行到一半之后就中止了, 提示FileNotFoundError: [Errno 2] No such file or directory: '/home/cxf/bag/CULane/river_193_90frame/06051317_0673.MP4/00180.jpg'

voldemortX commented 1 year ago

@CHANdaFeng 建议检查一下数据集有没有损坏修改或缺失。因为你如果是从官网下的,不会有river这个文件夹名

CHANdaFeng commented 1 year ago

@voldemortX 我查看了一下,数据是没有损坏的,不过按道理不应该是查找driver_100_30frame吗, 还是说代码我不小心修改了 删掉d了

voldemortX commented 1 year ago

@CHANdaFeng 代码和写路径的txt都看看,全局搜索一下

CHANdaFeng commented 1 year ago

@voldemortX hello,我在运行Lane points (Image Folder) 可视化的时候,我看历程是需要label标签的, 我训练结束后 生成的只有相关类似00000.lines 的txt文件, 没有看到label标签,请问一下是什么原因呀 官网 python tools/vis/lane_img_dir.py --image-path=PAD_test_images/lane_test_images/05171008_0748.MP4 --keypoint-path=PAD_test_images/lane_test_images/05171008_0748.MP4 --mask-path=PAD_test_images/lane_test_images/laneseg_label_w16/05171008_0748.MP4 --image-suffix=.jpg --keypoint-suffix=.lines.txt --mask-suffix=.png --save-path=PAD_test_images/lane_test_images/culane_res --config= 目前我使用resnet18_culane_aug1b训练culane数据集后没有相关label文件

voldemortX commented 1 year ago

voldemortX commented 1 year ago

@CHANdaFeng 你下载测试数据包PAD_test_images了吗,可以根据例子看看具体都是什么输入格式

Durobert commented 1 year ago

voldemortX commented 1 year ago

@Durobert You can find them in datasets/CULane.md