Closed dmortem closed 5 years ago
@dmortem how many gpus on your machine? It seems like there are something wrong with the dataloader
@jwyang 4 TITAN Xp in total.
@dmortem ok, then try use 2 workers for batch size=2.
@jwyang I modified the command line: CUDA_VISIBLE_DEVICES=2,3 python trainval_net.py --dataset pascal_voc --net res101 --bs 2 --nw 2 --lr 0.001 --lr_decay_step 5000 --cuda --mGPUs
But there is the same error as before.
@dmortem it seems that the output of fasterRCNN() will return none type or weird output. Try to debug a bit to see how that happened.
@jwyang Thank you! I solved this error according to the solution in issue #226. However, the speed isn't improved and the GPU util is 0%-2%. Is there any problem within Pytorch?
I have the same issue, while it works fine with only one GPU, and I try to used way of issue #226, but still have errors, below:
ctx.input_sizes = tuple(map(lambda i: i.size(ctx.dim), inputs)) RuntimeError: dimension specified as 0 but tensor has no dimensions
command line is 'CUDA_VISIBLE_DEVICES=0,1 python trainval_net.py --dataset pascal_voc --net res101 --bs 4 --nw 0 --lr 0.001 --lr_decay_step 5000 --cuda --mGPUs'
pytroch 0.4, cuda 9, python3.6
Thanks!
Hi, Maybe you haven't added the codes in the right place. You should make sure to add these four lines at the end of the 'forward' function.
@dmortem I see! Thanks!
hi,I have the same error as you,and I try to add these four lines.But it's not work.
$ CUDA_VISIBLE_DEVICES=1,2 python3 trainval_net.py \ --dataset pascal_voc --net res101 \ --bs 1 --nw 1 \ --lr 0.0001 --lr_decay_step 10 \ --cuda --mGPUs \ --s 2 \ --use_tfb --save_dir /home/testaccount/faster_rcnn/result Called with args: Namespace(batch_size=1, checkepoch=1, checkpoint=0, checkpoint_interval=10000, checksession=1, class_agnostic=False, cuda=True, dataset='pascal_voc', disp_interval=100, large_scale=False, lr=0.0001, lr_decay_gamma=0.1, lr_decay_step=10, mGPUs=True, max_epochs=20, net='res101', num_workers=1, optimizer='sgd', resume=False, save_dir='/home/testaccount/competition_workspace/crowd_counting/faster_rcnn/result', session=2, start_epoch=1, use_tfboard=True) Using config: {'ANCHOR_RATIOS': [0.5, 1, 2], 'ANCHOR_SCALES': [8, 16, 32], 'CROP_RESIZE_WITH_MAX_POOL': False, 'CUDA': False, 'DATA_DIR': '/home/testaccount/competition_workspace/crowd_counting/faster_rcnn/data', 'DEDUP_BOXES': 0.0625, 'EPS': 1e-14, 'EXP_DIR': 'res101', 'FEAT_STRIDE': [16], 'GPU_ID': 0, 'MATLAB': 'matlab', 'MAX_NUM_GT_BOXES': 20, 'MOBILENET': {'DEPTH_MULTIPLIER': 1.0, 'FIXED_LAYERS': 5, 'REGU_DEPTH': False, 'WEIGHT_DECAY': 4e-05}, 'PIXEL_MEANS': array([[[102.9801, 115.9465, 122.7717]]]), 'POOLING_MODE': 'align', 'POOLING_SIZE': 7, 'RESNET': {'FIXED_BLOCKS': 1, 'MAX_POOL': False}, 'RNG_SEED': 3, 'ROOT_DIR': '/home/testaccount/competition_workspace/crowd_counting/faster_rcnn', 'TEST': {'BBOX_REG': True, 'HAS_RPN': True, 'MAX_SIZE': 1000, 'MODE': 'nms', 'NMS': 0.3, 'PROPOSAL_METHOD': 'gt', 'RPN_MIN_SIZE': 16, 'RPN_NMS_THRESH': 0.7, 'RPN_POST_NMS_TOP_N': 300, 'RPN_PRE_NMS_TOP_N': 6000, 'RPN_TOP_N': 5000, 'SCALES': [600], 'SVM': False}, 'TRAIN': {'ASPECT_GROUPING': False, 'BATCH_SIZE': 128, 'BBOX_INSIDE_WEIGHTS': [1.0, 1.0, 1.0, 1.0], 'BBOX_NORMALIZE_MEANS': [0.0, 0.0, 0.0, 0.0], 'BBOX_NORMALIZE_STDS': [0.1, 0.1, 0.2, 0.2], 'BBOX_NORMALIZE_TARGETS': True, 'BBOX_NORMALIZE_TARGETS_PRECOMPUTED': True, 'BBOX_REG': True, 'BBOX_THRESH': 0.5, 'BG_THRESH_HI': 0.5, 'BG_THRESH_LO': 0.0, 'BIAS_DECAY': False, 'BN_TRAIN': False, 'DISPLAY': 20, 'DOUBLE_BIAS': False, 'FG_FRACTION': 0.25, 'FG_THRESH': 0.5, 'GAMMA': 0.1, 'HAS_RPN': True, 'IMS_PER_BATCH': 1, 'LEARNING_RATE': 0.001, 'MAX_SIZE': 1000, 'MOMENTUM': 0.9, 'PROPOSAL_METHOD': 'gt', 'RPN_BATCHSIZE': 256, 'RPN_BBOX_INSIDE_WEIGHTS': [1.0, 1.0, 1.0, 1.0], 'RPN_CLOBBER_POSITIVES': False, 'RPN_FG_FRACTION': 0.5, 'RPN_MIN_SIZE': 8, 'RPN_NEGATIVE_OVERLAP': 0.3, 'RPN_NMS_THRESH': 0.7, 'RPN_POSITIVE_OVERLAP': 0.7, 'RPN_POSITIVE_WEIGHT': -1.0, 'RPN_POST_NMS_TOP_N': 2000, 'RPN_PRE_NMS_TOP_N': 12000, 'SCALES': [600], 'SNAPSHOT_ITERS': 5000, 'SNAPSHOT_KEPT': 3, 'SNAPSHOT_PREFIX': 'res101_faster_rcnn', 'STEPSIZE': [30000], 'SUMMARY_INTERVAL': 180, 'TRIM_HEIGHT': 600, 'TRIM_WIDTH': 600, 'TRUNCATED': False, 'USE_ALL_GT': True, 'USE_FLIPPED': True, 'USE_GT': False, 'WEIGHT_DECAY': 0.0001}, 'USE_GPU_NMS': True} Loaded dataset
voc_2007_trainval` for training
Set proposal method: gt
Appending horizontally-flipped training examples...
voc_2007_trainval gt roidb loaded from /home/testaccount/competition_workspace/crowd_counting/faster_rcnn/data/cache/voc_2007_trainval_gt_roidb.pkl
done
Preparing training data...
done
before filtering, there are 18950 images...
after filtering, there are 18646 images...
18646 roidb entries
/usr/local/lib/python3.5/dist-packages/torch/cuda/init.py:116: UserWarning:
Found GPU0 NVS 315 which is of cuda capability 2.1.
PyTorch no longer supports this GPU because it is too old.
warnings.warn(old_gpu_warn % (d, name, major, capability[1]))
Loading pretrained weights from data/pretrained_model/resnet101_caffe.pth
/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/data_parallel.py:24: UserWarning:
There is an imbalance between your GPUs. You may want to exclude GPU 0 which
has less than 75% of the memory or cores of GPU 1. You can do so by setting
the device_ids argument to DataParallel, or by setting the CUDA_VISIBLE_DEVICES
environment variable.
warnings.warn(imbalance_warn.format(device_ids[min_pos], device_ids[max_pos]))
THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCTensorMath.cu line=244 error=8 : invalid device function
Traceback (most recent call last):
File "trainval_net.py", line 320, in
@dmortem ok, then try use 2 workers for batch size=2.
Sir if i given batchsize=1 for 4gpus then how many workers i have to given model ?
Hi, @jwyang Thanks a lot for your great codes.
I meet some errors when I use --mGPUs to run the script while it works fine with only one GPU. My command line is 'CUDA_VISIBLE_DEVICES=2,3 python trainval_net.py --dataset pascal_voc --net res101 --bs 2 --nw 4 --lr 0.001 --lr_decay_step 5000 --cuda --mGPUs'. Is there any problems in the compilation? I use python3 and haven't modified make.sh. My GPU is TITAN Xp. And the error log is below:
Also, when I run the codes with one GPU, I find the GPU util only 1%-2%. It takes about 27s every 100 iterations. Is it normal about that?
Thanks a lot!!