Error when use --mGPUs - Githubissues

dmortem commented 6 years ago

Hi, @jwyang Thanks a lot for your great codes.

I meet some errors when I use --mGPUs to run the script while it works fine with only one GPU. My command line is 'CUDA_VISIBLE_DEVICES=2,3 python trainval_net.py --dataset pascal_voc --net res101 --bs 2 --nw 4 --lr 0.001 --lr_decay_step 5000 --cuda --mGPUs'. Is there any problems in the compilation? I use python3 and haven't modified make.sh. My GPU is TITAN Xp. And the error log is below:

Loaded dataset voc_2007_trainval for training Set proposal method: gt Appending horizontally-flipped training examples... voc_2007_trainval gt roidb loaded from /home/cywu/DR_detection/faster-rcnn-pytorch/data/cache/voc_2007_trainval_gt_roidb.pkl done Preparing training data... done before filtering, there are 10022 images... after filtering, there are 10022 images... 10022 roidb entries Loading pretrained weights from data/pretrained_model/resnet101_caffe.pth Traceback (most recent call last): File "trainval_net.py", line 320, in rois_label = fasterRCNN(im_data, im_info, gt_boxes, num_boxes) File "/home/cywu/.conda/envs/pytorch040/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in call result = self.forward(*input, *kwargs) File "/home/cywu/.conda/envs/pytorch040/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 115, in forward return self.gather(outputs, self.output_device) File "/home/cywu/.conda/envs/pytorch040/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 127, in gather return gather(outputs, output_device, dim=self.dim) File "/home/cywu/.conda/envs/pytorch040/lib/python3.5/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather return gather_map(outputs) File "/home/cywu/.conda/envs/pytorch040/lib/python3.5/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map return type(out)(map(gather_map, zip(outputs))) File "/home/cywu/.conda/envs/pytorch040/lib/python3.5/site-packages/torch/nn/parallel/scatter_gather.py", line 55, in gather_map return Gather.apply(target_device, dim, *outputs) File "/home/cywu/.conda/envs/pytorch040/lib/python3.5/site-packages/torch/nn/parallel/_functions.py", line 54, in forward ctx.input_sizes = tuple(map(lambda i: i.size(ctx.dim), inputs)) File "/home/cywu/.conda/envs/pytorch040/lib/python3.5/site-packages/torch/nn/parallel/_functions.py", line 54, in ctx.input_sizes = tuple(map(lambda i: i.size(ctx.dim), inputs)) RuntimeError: dimension specified as 0 but tensor has no dimensions Exception ignored in: <bound method _DataLoaderIter.del of <torch.utils.data.dataloader._DataLoaderIter object at 0x7f5af2bd1e10>> Traceback (most recent call last): File "/home/cywu/.conda/envs/pytorch040/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 349, in del File "/home/cywu/.conda/envs/pytorch040/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 328, in _shutdown_workers File "/home/cywu/.conda/envs/pytorch040/lib/python3.5/multiprocessing/queues.py", line 337, in get File "", line 968, in _find_and_load File "", line 953, in _find_and_load_unlocked File "", line 887, in _find_spec TypeError: 'NoneType' object is not iterable

Also, when I run the codes with one GPU, I find the GPU util only 1%-2%. It takes about 27s every 100 iterations. Is it normal about that?

Thanks a lot!!

jwyang commented 6 years ago

@dmortem how many gpus on your machine? It seems like there are something wrong with the dataloader

dmortem commented 6 years ago

@jwyang 4 TITAN Xp in total.

jwyang commented 6 years ago

@dmortem ok, then try use 2 workers for batch size=2.

dmortem commented 6 years ago

@jwyang I modified the command line: CUDA_VISIBLE_DEVICES=2,3 python trainval_net.py --dataset pascal_voc --net res101 --bs 2 --nw 2 --lr 0.001 --lr_decay_step 5000 --cuda --mGPUs

But there is the same error as before.

jwyang commented 6 years ago

@dmortem it seems that the output of fasterRCNN() will return none type or weird output. Try to debug a bit to see how that happened.

dmortem commented 6 years ago

@jwyang Thank you! I solved this error according to the solution in issue #226. However, the speed isn't improved and the GPU util is 0%-2%. Is there any problem within Pytorch?

zpyovo commented 5 years ago

I have the same issue, while it works fine with only one GPU, and I try to used way of issue #226, but still have errors, below: ctx.input_sizes = tuple(map(lambda i: i.size(ctx.dim), inputs)) RuntimeError: dimension specified as 0 but tensor has no dimensions command line is 'CUDA_VISIBLE_DEVICES=0,1 python trainval_net.py --dataset pascal_voc --net res101 --bs 4 --nw 0 --lr 0.001 --lr_decay_step 5000 --cuda --mGPUs' pytroch 0.4, cuda 9, python3.6 Thanks!

dmortem commented 5 years ago

Hi, Maybe you haven't added the codes in the right place. You should make sure to add these four lines at the end of the 'forward' function.

zpyovo commented 5 years ago

@dmortem I see! Thanks!

fangInFBI commented 5 years ago

hi,I have the same error as you,and I try to add these four lines.But it's not work. $ CUDA_VISIBLE_DEVICES=1,2 python3 trainval_net.py \ --dataset pascal_voc --net res101 \ --bs 1 --nw 1 \ --lr 0.0001 --lr_decay_step 10 \ --cuda --mGPUs \ --s 2 \ --use_tfb --save_dir /home/testaccount/faster_rcnn/result Called with args: Namespace(batch_size=1, checkepoch=1, checkpoint=0, checkpoint_interval=10000, checksession=1, class_agnostic=False, cuda=True, dataset='pascal_voc', disp_interval=100, large_scale=False, lr=0.0001, lr_decay_gamma=0.1, lr_decay_step=10, mGPUs=True, max_epochs=20, net='res101', num_workers=1, optimizer='sgd', resume=False, save_dir='/home/testaccount/competition_workspace/crowd_counting/faster_rcnn/result', session=2, start_epoch=1, use_tfboard=True) Using config: {'ANCHOR_RATIOS': [0.5, 1, 2], 'ANCHOR_SCALES': [8, 16, 32], 'CROP_RESIZE_WITH_MAX_POOL': False, 'CUDA': False, 'DATA_DIR': '/home/testaccount/competition_workspace/crowd_counting/faster_rcnn/data', 'DEDUP_BOXES': 0.0625, 'EPS': 1e-14, 'EXP_DIR': 'res101', 'FEAT_STRIDE': [16], 'GPU_ID': 0, 'MATLAB': 'matlab', 'MAX_NUM_GT_BOXES': 20, 'MOBILENET': {'DEPTH_MULTIPLIER': 1.0, 'FIXED_LAYERS': 5, 'REGU_DEPTH': False, 'WEIGHT_DECAY': 4e-05}, 'PIXEL_MEANS': array([[[102.9801, 115.9465, 122.7717]]]), 'POOLING_MODE': 'align', 'POOLING_SIZE': 7, 'RESNET': {'FIXED_BLOCKS': 1, 'MAX_POOL': False}, 'RNG_SEED': 3, 'ROOT_DIR': '/home/testaccount/competition_workspace/crowd_counting/faster_rcnn', 'TEST': {'BBOX_REG': True, 'HAS_RPN': True, 'MAX_SIZE': 1000, 'MODE': 'nms', 'NMS': 0.3, 'PROPOSAL_METHOD': 'gt', 'RPN_MIN_SIZE': 16, 'RPN_NMS_THRESH': 0.7, 'RPN_POST_NMS_TOP_N': 300, 'RPN_PRE_NMS_TOP_N': 6000, 'RPN_TOP_N': 5000, 'SCALES': [600], 'SVM': False}, 'TRAIN': {'ASPECT_GROUPING': False, 'BATCH_SIZE': 128, 'BBOX_INSIDE_WEIGHTS': [1.0, 1.0, 1.0, 1.0], 'BBOX_NORMALIZE_MEANS': [0.0, 0.0, 0.0, 0.0], 'BBOX_NORMALIZE_STDS': [0.1, 0.1, 0.2, 0.2], 'BBOX_NORMALIZE_TARGETS': True, 'BBOX_NORMALIZE_TARGETS_PRECOMPUTED': True, 'BBOX_REG': True, 'BBOX_THRESH': 0.5, 'BG_THRESH_HI': 0.5, 'BG_THRESH_LO': 0.0, 'BIAS_DECAY': False, 'BN_TRAIN': False, 'DISPLAY': 20, 'DOUBLE_BIAS': False, 'FG_FRACTION': 0.25, 'FG_THRESH': 0.5, 'GAMMA': 0.1, 'HAS_RPN': True, 'IMS_PER_BATCH': 1, 'LEARNING_RATE': 0.001, 'MAX_SIZE': 1000, 'MOMENTUM': 0.9, 'PROPOSAL_METHOD': 'gt', 'RPN_BATCHSIZE': 256, 'RPN_BBOX_INSIDE_WEIGHTS': [1.0, 1.0, 1.0, 1.0], 'RPN_CLOBBER_POSITIVES': False, 'RPN_FG_FRACTION': 0.5, 'RPN_MIN_SIZE': 8, 'RPN_NEGATIVE_OVERLAP': 0.3, 'RPN_NMS_THRESH': 0.7, 'RPN_POSITIVE_OVERLAP': 0.7, 'RPN_POSITIVE_WEIGHT': -1.0, 'RPN_POST_NMS_TOP_N': 2000, 'RPN_PRE_NMS_TOP_N': 12000, 'SCALES': [600], 'SNAPSHOT_ITERS': 5000, 'SNAPSHOT_KEPT': 3, 'SNAPSHOT_PREFIX': 'res101_faster_rcnn', 'STEPSIZE': [30000], 'SUMMARY_INTERVAL': 180, 'TRIM_HEIGHT': 600, 'TRIM_WIDTH': 600, 'TRUNCATED': False, 'USE_ALL_GT': True, 'USE_FLIPPED': True, 'USE_GT': False, 'WEIGHT_DECAY': 0.0001}, 'USE_GPU_NMS': True} Loaded datasetvoc_2007_trainval` for training Set proposal method: gt Appending horizontally-flipped training examples... voc_2007_trainval gt roidb loaded from /home/testaccount/competition_workspace/crowd_counting/faster_rcnn/data/cache/voc_2007_trainval_gt_roidb.pkl done Preparing training data... done before filtering, there are 18950 images... after filtering, there are 18646 images... 18646 roidb entries /usr/local/lib/python3.5/dist-packages/torch/cuda/init.py:116: UserWarning: Found GPU0 NVS 315 which is of cuda capability 2.1. PyTorch no longer supports this GPU because it is too old.

warnings.warn(old_gpu_warn % (d, name, major, capability[1])) Loading pretrained weights from data/pretrained_model/resnet101_caffe.pth /usr/local/lib/python3.5/dist-packages/torch/nn/parallel/data_parallel.py:24: UserWarning: There is an imbalance between your GPUs. You may want to exclude GPU 0 which has less than 75% of the memory or cores of GPU 1. You can do so by setting the device_ids argument to DataParallel, or by setting the CUDA_VISIBLE_DEVICES environment variable. warnings.warn(imbalance_warn.format(device_ids[min_pos], device_ids[max_pos])) THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCTensorMath.cu line=244 error=8 : invalid device function Traceback (most recent call last): File "trainval_net.py", line 320, in rois_label = fasterRCNN(im_data, im_info, gt_boxes, num_boxes) File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, *kwargs) File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/data_parallel.py", line 122, in forward replicas = self.replicate(self.module, self.device_ids[:len(inputs)]) File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/data_parallel.py", line 127, in replicate return replicate(module, device_ids) File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/replicate.py", line 12, in replicate param_copies = Broadcast.apply(devices, params) File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/_functions.py", line 19, in forward outputs = comm.broadcast_coalesced(inputs, ctx.target_gpus) File "/usr/local/lib/python3.5/dist-packages/torch/cuda/comm.py", line 40, in broadcast_coalesced return torch._C._broadcast_coalesced(tensors, devices, buffer_size) RuntimeError: cuda runtime error (8) : invalid device function at /pytorch/aten/src/THC/generic/THCTensorMath.cu:244 Exception ignored in: <bound method _DataLoaderIter.del of <torch.utils.data.dataloader._DataLoaderIter object at 0x7feedc547780>> Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 399, in del File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 378, in _shutdown_workers File "/usr/lib/python3.5/multiprocessing/queues.py", line 345, in get File "", line 969, in _find_and_load File "", line 954, in _find_and_load_unlocked File "", line 887, in _find_spec TypeError: 'NoneType' object is not iterable` Can you tell me the exact location of the code?

devendraswamy commented 4 years ago

@dmortem ok, then try use 2 workers for batch size=2.

Sir if i given batchsize=1 for 4gpus then how many workers i have to given model ?

jwyang / faster-rcnn.pytorch

Error when use --mGPUs #346