ricky-696 / AICUP_Baseline_BoT-SORT

BoT-SORT: Robust Associations Multi-Pedestrian Tracking
MIT License
162 stars 258 forks source link

關於nvidia driver not found #6

Open blackline0911 opened 2 months ago

blackline0911 commented 2 months ago

主辦單位與各位參賽者們好: 我因為電腦使用AMD Radeon™ RX 6650 XT顯示卡,所以在執行baseline reid training指令時出現以下錯誤:

RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

因為手邊沒有裝nvidia卡且記憶體足夠的電腦,所以想請問一下這樣有解決辦法嗎? 另外我是用win11的wsl (Windows Subsystem for Linux) 的linux系統。而且就算pip改安裝faiss-cpu而不是faiss-gpu,打算用cpu跑也會遇到相同的錯誤。 先前conda、pip install package 安裝步驟與baseline相同。 這邊是我的電腦硬體配置 image image

指令就是python3 fast_reid/tools/train_net.py --config-file fast_reid/configs/AICUP/bagtricks_R50-ibn.yml MODEL.DEVICE "cuda:0"

詳細output如下: Command Line Args: Namespace(config_file='fast_reid/configs/AICUP/bagtricks_R50-ibn.yml', dist_url='tcp://127.0.0.1:50152', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=['MODEL.DEVICE', 'cuda:0'], resume=False) [05/04 23:09:40 fastreid]: Rank of current process: 0. World size: 1 [05/04 23:09:40 fastreid]: Environment info:


sys.platform linux Python 3.7.16 (default, Jan 17 2023, 22:20:44) [GCC 11.2.0] numpy 1.21.6 fastreid failed to import FASTREID_ENV_MODULE PyTorch 1.13.1+cu117 @/home/kevin/anaconda3/envs/bot/lib/python3.7/site-packages/torch PyTorch debug build False GPU available False Pillow 9.5.0 torchvision 0.14.1+cu117 @/home/kevin/anaconda3/envs/bot/lib/python3.7/site-packages/torchvision cv2 4.9.0


PyTorch built with:

[05/04 23:09:40 fastreid]: Command line arguments: Namespace(config_file='fast_reid/configs/AICUP/bagtricks_R50-ibn.yml', dist_url='tcp://127.0.0.1:50152', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=['MODEL.DEVICE', 'cuda:0'], resume=False) [05/04 23:09:40 fastreid]: Contents of args.config_file=fast_reid/configs/AICUP/bagtricks_R50-ibn.yml: BASE: ../Base-bagtricks.yml

INPUT: SIZE_TRAIN: [256, 256] SIZE_TEST: [256, 256]

MODEL: BACKBONE: WITH_IBN: True HEADS: POOL_LAYER: GeneralizedMeanPooling

LOSSES: TRI: HARD_MINING: False MARGIN: 0.0

DATASETS: NAMES: ("AICUP",) TESTS: ("AICUP",)

SOLVER: BIAS_LR_FACTOR: 1.

IMS_PER_BATCH: 256 MAX_EPOCH: 60 STEPS: [30, 50] WARMUP_ITERS: 2000

CHECKPOINT_PERIOD: 1

TEST: EVAL_PERIOD: 60 # We didn't provide eval dataset IMS_PER_BATCH: 256

OUTPUT_DIR: logs/AICUP_115/bagtricks_R50-ibn

[05/04 23:09:40 fastreid]: Running with full config: CUDNN_BENCHMARK: True DATALOADER: NUM_INSTANCE: 4 NUM_WORKERS: 8 SAMPLER_TRAIN: NaiveIdentitySampler SET_WEIGHT: [] DATASETS: COMBINEALL: False NAMES: ('AICUP',) TESTS: ('AICUP',) INPUT: AFFINE: ENABLED: False AUGMIX: ENABLED: False PROB: 0.0 AUTOAUG: ENABLED: False PROB: 0.0 CJ: BRIGHTNESS: 0.15 CONTRAST: 0.15 ENABLED: False HUE: 0.1 PROB: 0.5 SATURATION: 0.1 CROP: ENABLED: False RATIO: [0.75, 1.3333333333333333] SCALE: [0.16, 1] SIZE: [224, 224] FLIP: ENABLED: True PROB: 0.5 PADDING: ENABLED: True MODE: constant SIZE: 10 REA: ENABLED: True PROB: 0.5 VALUE: [123.675, 116.28, 103.53] RPT: ENABLED: False PROB: 0.5 SIZE_TEST: [256, 256] SIZE_TRAIN: [256, 256] KD: EMA: ENABLED: False MOMENTUM: 0.999 MODEL_CONFIG: [] MODEL_WEIGHTS: [] MODEL: BACKBONE: ATT_DROP_RATE: 0.0 DEPTH: 50x DROP_PATH_RATIO: 0.1 DROP_RATIO: 0.0 FEAT_DIM: 2048 LAST_STRIDE: 1 NAME: build_resnet_backbone NORM: BN PRETRAIN: True PRETRAIN_PATH: SIE_COE: 3.0 STRIDE_SIZE: (16, 16) WITH_IBN: True WITH_NL: False WITH_SE: False DEVICE: cuda:0 FREEZE_LAYERS: [] HEADS: CLS_LAYER: Linear EMBEDDING_DIM: 0 MARGIN: 0.0 NAME: EmbeddingHead NECK_FEAT: before NORM: BN NUM_CLASSES: 0 POOL_LAYER: GeneralizedMeanPooling SCALE: 1 WITH_BNNECK: True LOSSES: CE: ALPHA: 0.2 EPSILON: 0.1 SCALE: 1.0 CIRCLE: GAMMA: 128 MARGIN: 0.25 SCALE: 1.0 COSFACE: GAMMA: 128 MARGIN: 0.25 SCALE: 1.0 FL: ALPHA: 0.25 GAMMA: 2 SCALE: 1.0 NAME: ('CrossEntropyLoss', 'TripletLoss') TRI: HARD_MINING: False MARGIN: 0.0 NORM_FEAT: False SCALE: 1.0 META_ARCHITECTURE: Baseline PIXEL_MEAN: [123.675, 116.28, 103.53] PIXEL_STD: [58.395, 57.120000000000005, 57.375] QUEUE_SIZE: 8192 WEIGHTS: OUTPUT_DIR: logs/AICUP_115/bagtricks_R50-ibn SOLVER: AMP: ENABLED: True BASE_LR: 0.00035 BIAS_LR_FACTOR: 1.0 CHECKPOINT_PERIOD: 1 CLIP_GRADIENTS: CLIP_TYPE: norm CLIP_VALUE: 5.0 ENABLED: False NORM_TYPE: 2.0 DELAY_EPOCHS: 0 ETA_MIN_LR: 1e-07 FREEZE_ITERS: 0 GAMMA: 0.1 HEADS_LR_FACTOR: 1.0 IMS_PER_BATCH: 256 MAX_EPOCH: 60 MOMENTUM: 0.9 NESTEROV: False OPT: Adam SCHED: MultiStepLR STEPS: [30, 50] WARMUP_FACTOR: 0.1 WARMUP_ITERS: 2000 WARMUP_METHOD: linear WEIGHT_DECAY: 0.0005 WEIGHT_DECAY_BIAS: 0.0005 WEIGHT_DECAY_NORM: 0.0005 TEST: AQE: ALPHA: 3.0 ENABLED: False QE_K: 5 QE_TIME: 1 EVAL_PERIOD: 60 FLIP: ENABLED: False IMS_PER_BATCH: 256 METRIC: cosine PRECISE_BN: DATASET: Market1501 ENABLED: False NUM_ITER: 300 RERANK: ENABLED: False K1: 20 K2: 6 LAMBDA: 0.3 ROC: ENABLED: False [05/04 23:09:40 fastreid]: Full config saved to /mnt/c/Users/kevin/Desktop/ai_cup/AICUP_Baseline_BoT-SORT/logs/AICUP_115/bagtricks_R50-ibn/config.yaml /home/kevin/anaconda3/envs/bot/lib/python3.7/site-packages/torchvision/transforms/transforms.py:330: UserWarning: Argument 'interpolation' of type int is deprecated since 0.13 and will be removed in 0.15. Please use InterpolationMode enum. "Argument 'interpolation' of type int is deprecated since 0.13 and will be removed in 0.15. " Traceback (most recent call last): File "fast_reid/tools/train_net.py", line 60, in args=(args,), File "./fast_reid/fastreid/engine/launch.py", line 71, in launch main_func(*args) File "fast_reid/tools/train_net.py", line 43, in main trainer = DefaultTrainer(cfg) File "./fast_reid/fastreid/engine/defaults.py", line 203, in init data_loader = self.build_train_loader(cfg) File "./fast_reid/fastreid/engine/defaults.py", line 402, in build_train_loader return build_reid_train_loader(cfg, combineall=cfg.DATASETS.COMBINEALL) File "./fast_reid/fastreid/config/config.py", line 265, in wrapped return orig_func(**explicit_args) File "./fast_reid/fastreid/data/build.py", line 98, in build_reid_train_loader pin_memory=True, File "./fast_reid/fastreid/data/data_utils.py", line 152, in init local_rank File "/home/kevin/anaconda3/envs/bot/lib/python3.7/site-packages/torch/cuda/streams.py", line 36, in new with torch.cuda.device(device): File "/home/kevin/anaconda3/envs/bot/lib/python3.7/site-packages/torch/cuda/init.py", line 287, in enter self.prev_idx = torch.cuda.current_device() File "/home/kevin/anaconda3/envs/bot/lib/python3.7/site-packages/torch/cuda/init.py", line 552, in current_device _lazy_init() File "/home/kevin/anaconda3/envs/bot/lib/python3.7/site-packages/torch/cuda/init.py", line 229, in _lazy_init torch._C._cuda_init() RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

MuennL commented 2 months ago

nvidia driver not found 就是 nvidia driver沒有安裝,可以查詢官方手冊看看所使用的gpu所對應的driver version。

ricky-696 commented 1 month ago

兄弟~

你用AMD的顯卡,怎麼會是裝NVIDIA的Driver呢

我自己對AMD不熟,但Pytorch有支援AMD的版本,好像要額外安裝ROCm相關套件等等,這個Blog有詳細說明,你再試試看吧,加油~

aquastripe commented 1 month ago

NVIDIA 的 CUDA = AMD 的 ROCm 你的顯卡剛好有支援 ROCm: https://rocm.docs.amd.com/projects/install-on-windows/en/latest/reference/system-requirements.html 可以參考官網如何安裝 PyTorch+ROCm: https://rocm.docs.amd.com/projects/install-on-linux/en/develop/how-to/3rd-party/pytorch-install.html

要注意的是目前 PyTorch+ROCm 不支援 Windows,以及你的顯卡可以用 ROCm runtime 但不支援整個 SDK。

不要用 CPU 跑,你可以試試 Colab。