voldemortX / pytorch-auto-drive

PytorchAutoDrive: Segmentation models (ERFNet, ENet, DeepLab, FCN...) and Lane detection models (SCNN, RESA, LSTR, LaneATT, BézierLaneNet...) based on PyTorch with fast training, visualization, benchmarking & deployment help
BSD 3-Clause "New" or "Revised" License
840 stars 138 forks source link

配置Lane detection 的数据集culane的demo遇到的问题 #150

Open CHANdaFeng opened 1 year ago

CHANdaFeng commented 1 year ago

请问以下我在配置Lane detection 的数据集culane的demo中,当训练启动python main_landet.py --test --config=configs/lane_detection/bezierlanenet/resnet34_culane_aug1b.py --checkpoint=resnet34_bezierlanenet_culane_aug1b_20211109.pt 遇到问题 NVIDIA GeForce RTX 3060 with CUDA capability sm_86 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37. If you want to use the NVIDIA GeForce RTX 3060 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/ 报错

我配的环境安装官网配的pytorch1.6 nvidia-smi显示cuda是12.1, nvcc -V,显示没有,请问环境方面有什么问题吗 下面是环境的 (pad) cxf@cxf:~$ conda list

packages in environment at /home/cxf/anaconda3/envs/pad:

#

Name Version Build Channel

_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
absl-py 1.4.0 pypi_0 pypi addict 2.4.0 pypi_0 pypi blas 1.0 mkl
ca-certificates 2023.05.30 h06a4308_0
cachetools 4.2.4 pypi_0 pypi certifi 2021.5.30 py36h06a4308_0
charset-normalizer 2.0.12 pypi_0 pypi cudatoolkit 10.2.89 hfd86e86_1
dataclasses 0.8 pypi_0 pypi dill 0.3.4 pypi_0 pypi filetype 1.0.8 pypi_0 pypi freetype 2.12.1 h4a9f257_0
future 0.18.3 pypi_0 pypi google-auth 2.20.0 pypi_0 pypi google-auth-oauthlib 0.4.6 pypi_0 pypi grpcio 1.48.2 pypi_0 pypi idna 3.4 pypi_0 pypi imageio 2.10.1 pypi_0 pypi importlib-metadata 4.8.3 pypi_0 pypi importmagician 0.1.0 pypi_0 pypi intel-openmp 2022.1.0 h9e868ea_3769
joblib 1.1.1 pypi_0 pypi jpeg 9e h5eee18b_1
lcms2 2.12 h3be6417_0
ld_impl_linux-64 2.38 h1181459_1
lerc 3.0 h295c915_0
libdeflate 1.17 h5eee18b_0
libffi 3.3 he6710b0_2
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libpng 1.6.39 h5eee18b_0
libstdcxx-ng 11.2.0 h1234567_1
libtiff 4.5.0 h6a678d5_2
libwebp-base 1.2.4 h5eee18b_1
lz4-c 1.9.4 h6a678d5_0
markdown 3.3.7 pypi_0 pypi mkl 2020.2 256
mkl-service 2.3.0 py36he8ac12f_0
mkl_fft 1.3.0 py36h54f3939_0
mkl_random 1.1.1 py36h0573a6f_0
mmcv-full 1.3.5 pypi_0 pypi multiprocess 0.70.12.2 pypi_0 pypi ncurses 6.4 h6a678d5_0
ninja 1.11.1 pypi_0 pypi ninja-base 1.10.2 hd09550d_5
numpy 1.19.2 py36h54aff64_0
numpy-base 1.19.2 py36hfa32c7d_0
oauthlib 3.2.2 pypi_0 pypi olefile 0.46 py36_0
opencv-python 4.5.4.58 pypi_0 pypi openjpeg 2.4.0 h3ad879b_0
openssl 1.1.1t h7f8727e_0
p-tqdm 1.3.3 pypi_0 pypi pathos 0.2.8 pypi_0 pypi pillow 8.4.0 pypi_0 pypi pip 21.2.2 py36h06a4308_0
pox 0.3.0 pypi_0 pypi ppft 1.6.6.4 pypi_0 pypi protobuf 3.19.6 pypi_0 pypi pyasn1 0.5.0 pypi_0 pypi pyasn1-modules 0.3.0 pypi_0 pypi python 3.6.13 h12debd9_1
pytorch 1.6.0 py3.6_cuda10.2.89_cudnn7.6.5_0 pytorch pyyaml 6.0 pypi_0 pypi readline 8.2 h5eee18b_0
requests 2.27.1 pypi_0 pypi requests-oauthlib 1.3.1 pypi_0 pypi rsa 4.9 pypi_0 pypi scikit-learn 0.23.2 pypi_0 pypi scipy 1.5.4 pypi_0 pypi setuptools 58.0.4 py36h06a4308_0
shapely 1.8.0 pypi_0 pypi six 1.16.0 pyhd3eb1b0_1
sqlite 3.41.2 h5eee18b_0
tensorboard 2.7.0 pypi_0 pypi tensorboard-data-server 0.6.1 pypi_0 pypi tensorboard-plugin-wit 1.8.1 pypi_0 pypi thop 0.0.31-2005241907 pypi_0 pypi threadpoolctl 3.1.0 pypi_0 pypi timm 0.4.5 pypi_0 pypi tk 8.6.12 h1ccaba5_0
torchvision 0.7.0 py36_cu102 pytorch tqdm 4.62.3 pypi_0 pypi typing-extensions 4.1.1 pypi_0 pypi ujson 4.2.0 pypi_0 pypi urllib3 1.26.16 pypi_0 pypi werkzeug 2.0.3 pypi_0 pypi wheel 0.37.1 pyhd3eb1b0_0
xz 5.4.2 h5eee18b_0
yapf 0.32.0 pypi_0 pypi zipp 3.6.0 pypi_0 pypi zlib 1.2.13 h5eee18b_0
zstd 1.5.5 hc292b87_0

voldemortX commented 1 year ago

@CHANdaFeng 你这个报错是因为使用的pytorch只支持sm75(20系显卡),你可以看看30系最低需要哪个版本。这个和你cuda版本没关系。

CHANdaFeng commented 1 year ago

@voldemortX 好的,非常感谢!我在查查原因!

CHANdaFeng commented 1 year ago

hello,请问一下我现在的环境是 cuda 12.1,torch 1.10.1 ,mmcv-full 1.4.6
在运行python main_landet.py --train --config=configs/lane_detection/baseline/enet_culane.py 时候报错: ImportError: libcudart.so.11.0: cannot open shared object file: No such file or directory 是什么原因呀? 查看确实没有libcudart.so.11.0:, 不应该是调用12.1吗

voldemortX commented 1 year ago

@CHANdaFeng 你安装的mmcv是对应cuda 12.1的版本吗,mmcv不是有个表单对应各个版本

CHANdaFeng commented 1 year ago

@voldemortX 好的,我在查查原因 谢谢!

CHANdaFeng commented 1 year ago

@voldemortX hello~ 请问一下 我运行 python main_landet.py --train --config=configs/lane_detection/bezierlanenet/resnet18_culane_aug1b.py
报错超时RuntimeError: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. 请问一下是什么原因呀? 现在环境是

cuda11.3.0 torch==1.11.0 l mmcv==2.0.0 numpy=1.19

具体报错如下: /home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/torch/cuda/init.py:82: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:112.) No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda-11.3' /home/cxf/pytorch-auto-drive/utils/models/lane_detection/laneatt.py:22: UserWarning: Can't complie line nms op for LaneATT. Set verbose=True for load in /utils/csrc/apis.py L9 for details. main_landet.py:24: UserWarning: Unable to set a high enough file descriptor limit 8192 (your system may has a low hard limit 4096). If you encounter related problems in training, try reduce the number of workers by --workers, or switch into file_system mode at Line 8. Loaded torchvision ImageNet pre-trained weights V1. Not using distributed mode cuda Traceback (most recent call last): File "main_landet.py", line 75, in runner = Runner(cfg=cfg) File "/home/cxf/pytorch-auto-drive/utils/runners/lane_det_trainer.py", line 17, in init super().init(cfg) File "/home/cxf/pytorch-auto-drive/utils/runners/base.py", line 117, in init net_without_ddp, self.device = self.get_device_and_move_model() File "/home/cxf/pytorch-auto-drive/utils/runners/base.py", line 159, in get_device_and_move_model self.model.to(device) File "/home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/torch/nn/modules/module.py", line 907, in to return self._apply(convert) File "/home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/torch/nn/modules/module.py", line 578, in _apply module._apply(fn) File "/home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/torch/nn/modules/module.py", line 578, in _apply module._apply(fn) File "/home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/torch/nn/modules/module.py", line 601, in _apply param_applied = fn(param) File "/home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/torch/nn/modules/module.py", line 905, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) File "/home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/torch/cuda/init.py", line 216, in _lazy_init torch._C._cuda_init() RuntimeError: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero.

voldemortX commented 1 year ago

@CHANdaFeng 看不出来,首先你的nms编译失败很可能就是程序没识别到cuda。应该是你的环境有问题,你可以先试一些简单的脚本能不能跑,逐步定位问题。

CHANdaFeng commented 1 year ago

@voldemortX hello,现在nms编译成功了,当在启动 python main_landet.py --train --config=configs/lane_detection/bezierlanenet/resnet18_culane_aug1b.py 时候报错,如下,请问是什么原因呀,感谢您的耐心解答! Successfully complied line nms for LaneATT. main_landet.py:24: UserWarning: Unable to set a high enough file descriptor limit 8192 (your system may has a low hard limit 4096). If you encounter related problems in training, try reduce the number of workers by --workers, or switch into file_system mode at Line 8. Loaded torchvision ImageNet pre-trained weights V1. Not using distributed mode cuda Build from dict error in function or class: CULaneAsBezier In Python: <class 'utils.datasets.lane_as_bezier.CULaneAsBezier'> Traceback (most recent call last): File "main_landet.py", line 75, in runner = Runner(cfg=cfg) File "/home/cxf/pytorch-auto-drive/utils/runners/lane_det_trainer.py", line 17, in init super().init(cfg) File "/home/cxf/pytorch-auto-drive/utils/runners/base.py", line 127, in init dataset = DATASETS.from_dict(cfg['dataset'], File "/home/cxf/pytorch-auto-drive/utils/registry.py", line 41, in from_dict raise e File "/home/cxf/pytorch-auto-drive/utils/registry.py", line 38, in from_dict return function_or_class(**dictparams) File "/home/cxf/pytorch-auto-drive/utils/datasets/lane_as_bezier.py", line 39, in init self._init_all() File "/home/cxf/pytorch-auto-drive/utils/datasets/lane_as_bezier.py", line 104, in _init_all self.beziers = self.loader_bezier() File "/home/cxf/pytorch-auto-drive/utils/datasets/lane_as_bezier.py", line 72, in loader_bezier with open(self.bezier_labels, 'r') as f: FileNotFoundError: [Errno 2] No such file or directory: '/home/cxf/bag/CULane/bezier_labels/train_3.json'

CHANdaFeng commented 1 year ago

我已经按照CULane Dataset中下载对应的数据集并修改,下载的时候没有看到bezier_labels/train_3.json'文件?

CHANdaFeng commented 1 year ago

@voldemortX 我已经解决这个问题~

CHANdaFeng commented 1 year ago

hello @voldemortX ,我训练CULane数据集 resnet18_culane_aug1b.py 得到pt模型后, 想看一下test预测的效果, 在运行python main_landet.py --test --config=configs/lane_detection/bezierlanenet/resnet18_culane_aug1b.py --mixed-precision 时候遇到 找不到这个图片的错误, 但是数据集对应的路径实际是有这个文件的,请问是什么原因呀 (pad) cxf@cxf:~/pytorch-auto-drive$ python main_landet.py --test --config=configs/lane_detection/bezierlanenet/resnet18_culane_aug1b.py --mixed-precision Successfully complied line nms for LaneATT. main_landet.py:24: UserWarning: Unable to set a high enough file descriptor limit 8192 (your system may has a low hard limit 4096). If you encounter related problems in training, try reduce the number of workers by --workers, or switch into file_system mode at Line 8. Loaded torchvision ImageNet pre-trained weights V1. cuda:0 0%| | 0/34680 [00:00<?, ?it/s] Traceback (most recent call last): File "main_landet.py", line 76, in runner.run() File "/home/cxf/pytorch-auto-drive/utils/runners/lane_det_tester.py", line 31, in run self.test_one_set(self.model, self.device, self.dataloader, self._cfg['mixed_precision'], File "/home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/home/cxf/pytorch-auto-drive/utils/runners/lane_det_tester.py", line 47, in test_one_set for images, filenames in tqdm(loader): File "/home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/tqdm/std.py", line 1180, in iter for obj in iterable: File "/home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 530, in next data = self._next_data() File "/home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 570, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/cxf/pytorch-auto-drive/utils/datasets/lane_as_bezier.py", line 48, in getitem img = Image.open(self.images[index]).convert('RGB') File "/home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/PIL/Image.py", line 3227, in open fp = builtins.open(filename, "rb") FileNotFoundError: [Errno 2] No such file or directory: '/home/cxf/bag/CULane/river_100_30frame/05251517_0433.MP4/00000.jpg'

voldemortX commented 1 year ago

@CHANdaFeng '/home/cxf/bag/CULane/river_100_30frame/05251517_0433.MP4/00000.jpg'这个文件是存在的?

CHANdaFeng commented 1 year ago

@voldemortX 好像是不存在的,我把driver_100_30frame 文件夹名称改成对应的river_100_30frame,好像可以了

voldemortX commented 1 year ago

@voldemortX 好像是不存在的,我把driver_100_30frame 文件夹名称改成对应的river_100_30frame,好像可以了

那可能是你的文件夹名字和默认的不太一致

CHANdaFeng commented 1 year ago

@voldemortX 好的, 不过有点疑问,这个数据集是从官网下载的CULane,我修改之后,现在运行 python main_landet.py --test --config=configs/lane_detection/bezierlanenet/resnet18_culane_aug1b.py --checkpoint=checkpoints/resnet18_bezierlanenet_culane-aug2/model.pt 运行到一半之后就中止了, 提示FileNotFoundError: [Errno 2] No such file or directory: '/home/cxf/bag/CULane/river_193_90frame/06051317_0673.MP4/00180.jpg'

voldemortX commented 1 year ago

@CHANdaFeng 建议检查一下数据集有没有损坏修改或缺失。因为你如果是从官网下的,不会有river这个文件夹名

CHANdaFeng commented 1 year ago

@voldemortX 我查看了一下,数据是没有损坏的,不过按道理不应该是查找driver_100_30frame吗, 还是说代码我不小心修改了 删掉d了

voldemortX commented 1 year ago

@CHANdaFeng 代码和写路径的txt都看看,全局搜索一下

CHANdaFeng commented 1 year ago

@voldemortX hello,我在运行Lane points (Image Folder) 可视化的时候,我看历程是需要label标签的, 我训练结束后 生成的只有相关类似00000.lines 的txt文件, 没有看到label标签,请问一下是什么原因呀 官网 python tools/vis/lane_img_dir.py --image-path=PAD_test_images/lane_test_images/05171008_0748.MP4 --keypoint-path=PAD_test_images/lane_test_images/05171008_0748.MP4 --mask-path=PAD_test_images/lane_test_images/laneseg_label_w16/05171008_0748.MP4 --image-suffix=.jpg --keypoint-suffix=.lines.txt --mask-suffix=.png --save-path=PAD_test_images/lane_test_images/culane_res --config= 目前我使用resnet18_culane_aug1b训练culane数据集后没有相关label文件

voldemortX commented 1 year ago

@voldemortX hello,我在运行Lane points (Image Folder) 可视化的时候,我看历程是需要label标签的, 我训练结束后 生成的只有相关类似00000.lines 的txt文件, 没有看到label标签,请问一下是什么原因呀 官网 python tools/vis/lane_img_dir.py --image-path=PAD_test_images/lane_test_images/05171008_0748.MP4 --keypoint-path=PAD_test_images/lane_test_images/05171008_0748.MP4 --mask-path=PAD_test_images/lane_test_images/laneseg_label_w16/05171008_0748.MP4 --image-suffix=.jpg --keypoint-suffix=.lines.txt --mask-suffix=.png --save-path=PAD_test_images/lane_test_images/culane_res --config= 目前我使用resnet18_culane_aug1b训练culane数据集后没有相关label文件

这里没写需要label吧

voldemortX commented 1 year ago

@CHANdaFeng 你下载测试数据包PAD_test_images了吗,可以根据例子看看具体都是什么输入格式

Durobert commented 1 year ago

我已经按照CULane Dataset中下载对应的数据集并修改,下载的时候没有看到bezier_labels/train_3.json'文件?

这个问题你是怎么解决的,我也碰到这个问题了

voldemortX commented 1 year ago

@Durobert You can find them in datasets/CULane.md