princeton-vl / CornerNet

BSD 3-Clause "New" or "Revised" License
2.36k stars 475 forks source link

THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1524584710464/work/aten/src/THC/THCTensorRandom.cu line=25 error=30 : unknown error #42

Closed NJUSTghw closed 5 years ago

NJUSTghw commented 6 years ago

loading all datasets... using 4 threads loading from cache file: ./cache/coco_trainval2014.pkl loading annotations into memory... Done (t=11.78s) creating index... index created! loading from cache file: ./cache/coco_trainval2014.pkl loading annotations into memory... Done (t=8.33s) creating index... index created! loading from cache file: ./cache/coco_trainval2014.pkl loading annotations into memory... Done (t=7.80s) creating index... index created! loading from cache file: ./cache/coco_trainval2014.pkl loading annotations into memory... Done (t=11.06s) creating index... index created! loading from cache file: ./cache/coco_minival2014.pkl loading annotations into memory... Done (t=0.38s) creating index... index created! system config... {'batch_size': 2, 'cache_dir': './cache', 'chunk_sizes': [4, 5, 5, 5, 5, 5, 5, 5, 5, 5], 'config_dir': './config', 'data_dir': './data', 'data_rng': <mtrand.RandomState object at 0x7f6494e26a68>, 'dataset': 'MSCOCO', 'decay_rate': 10, 'display': 5, 'learning_rate': 0.00025, 'max_iter': 500000, 'nnet_rng': <mtrand.RandomState object at 0x7f6494e26ab0>, 'opt_algo': 'adam', 'prefetch_size': 5, 'pretrain': None, 'result_dir': './results', 'sampling_function': 'kp_detection', 'snapshot': 5000, 'snapshot_name': 'CornerNet', 'stepsize': 450000, 'test_split': 'testdev', 'train_split': 'trainval', 'val_iter': 100, 'val_split': 'minival', 'weight_decay': False, 'weight_decay_rate': 1e-05, 'weight_decay_type': 'l2'} db config... {'ae_threshold': 0.5, 'border': 128, 'categories': 80, 'data_aug': True, 'gaussian_bump': True, 'gaussian_iou': 0.7, 'gaussian_radius': -1, 'input_size': [511, 511], 'lighting': True, 'max_per_image': 100, 'merge_bbox': False, 'nms_algorithm': 'exp_soft_nms', 'nms_kernel': 3, 'nms_threshold': 0.5, 'output_sizes': [[128, 128]], 'rand_color': True, 'rand_crop': True, 'rand_pushes': False, 'rand_samples': False, 'rand_scale_max': 1.4, 'rand_scale_min': 0.6, 'rand_scale_step': 0.1, 'rand_scales': array([0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2, 1.3]), 'special_crop': False, 'test_scales': [1], 'top_k': 100, 'weight_exp': 8} len of db: 118287 start prefetching data... shuffling indices... start prefetching data... shuffling indices... start prefetching data... shuffling indices... start prefetching data... shuffling indices... (480, 640, 3) (480, 640, 3) (427, 640, 3) (434, 640, 3) building model... module_file: models.CornerNet start prefetching data... shuffling indices... (480, 640, 3) (375, 500, 3) (480, 640, 3) (427, 640, 3) (333, 500, 3) (640, 523, 3) (424, 640, 3) (480, 640, 3) (375, 500, 3) (500, 334, 3) (480, 640, 3) (500, 375, 3) (416, 640, 3) (500, 375, 3) (425, 640, 3) (480, 640, 3) (375, 500, 3) (640, 427, 3) (640, 609, 3) (383, 640, 3) (428, 640, 3) (426, 640, 3) (427, 640, 3) (640, 427, 3) (640, 480, 3) (555, 640, 3) (480, 640, 3) (416, 640, 3) (375, 500, 3) (375, 500, 3) THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1524584710464/work/aten/src/THC/THCTensorRandom.cu line=25 error=30 : unknown error

NJUSTghw commented 6 years ago

has anyone met the same issue? please help me out?

NJUSTghw commented 5 years ago

loading all datasets... using 4 threads loading from cache file: ./cache/coco_trainval2014.pkl loading annotations into memory... Done (t=8.17s) creating index... index created! loading from cache file: ./cache/coco_trainval2014.pkl loading annotations into memory... Done (t=7.94s) creating index... index created! loading from cache file: ./cache/coco_trainval2014.pkl loading annotations into memory... Done (t=7.52s) creating index... index created! loading from cache file: ./cache/coco_trainval2014.pkl loading annotations into memory... Done (t=10.59s) creating index... index created! loading from cache file: ./cache/coco_minival2014.pkl loading annotations into memory... Done (t=0.24s) creating index... index created! system config... {'batch_size': 1, 'cache_dir': './cache', 'chunk_sizes': [1], 'config_dir': './config', 'data_dir': './data', 'data_rng': <mtrand.RandomState object at 0x7f562d9ec990>, 'dataset': 'MSCOCO', 'decay_rate': 10, 'display': 5, 'learning_rate': 0.00025, 'max_iter': 500000, 'nnet_rng': <mtrand.RandomState object at 0x7f562d9ec9d8>, 'opt_algo': 'adam', 'prefetch_size': 5, 'pretrain': None, 'result_dir': './results', 'sampling_function': 'kp_detection', 'snapshot': 5000, 'snapshot_name': 'CornerNet', 'stepsize': 450000, 'test_split': 'testdev', 'train_split': 'trainval', 'val_iter': 100, 'val_split': 'minival', 'weight_decay': False, 'weight_decay_rate': 1e-05, 'weight_decay_type': 'l2'} db config... {'ae_threshold': 0.5, 'border': 128, 'categories': 80, 'data_aug': True, 'gaussian_bump': True, 'gaussian_iou': 0.7, 'gaussian_radius': -1, 'input_size': [511, 511], 'lighting': True, 'max_per_image': 100, 'merge_bbox': False, 'nms_algorithm': 'exp_soft_nms', 'nms_kernel': 3, 'nms_threshold': 0.5, 'output_sizes': [[128, 128]], 'rand_color': True, 'rand_crop': True, 'rand_pushes': False, 'rand_samples': False, 'rand_scale_max': 1.4, 'rand_scale_min': 0.6, 'rand_scale_step': 0.1, 'rand_scales': array([0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2, 1.3]), 'special_crop': False, 'test_scales': [1], 'top_k': 100, 'weight_exp': 8} len of db: 118287 start prefetching data... shuffling indices... start prefetching data... shuffling indices... start prefetching data... shuffling indices... start prefetching data... shuffling indices... start prefetching data... building model... module_file: models.CornerNet shuffling indices... THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1524584710464/work/aten/src/THC/THCTensorRandom.cu line=25 error=30 : unknown error Exception in thread Thread-1: Traceback (most recent call last): File "/home/s106/anaconda3/envs/CornerNet/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/home/s106/anaconda3/envs/CornerNet/lib/python3.6/threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "train.py", line 52, in pin_memory data["xs"] = [x.pin_memory() for x in data["xs"]] File "train.py", line 52, in data["xs"] = [x.pin_memory() for x in data["xs"]] RuntimeError: cuda runtime error (30) : unknown error at /opt/conda/conda-bld/pytorch_1524584710464/work/aten/src/THC/THCTensorRandom.cu:25

I am not sure what the exact error is

Zhangyongtao123 commented 5 years ago

I meet this error, but I don't know how to solve it

Zhangyongtao123 commented 5 years ago

Maybe it's just because you don't have enough GPUs or your GPUs don't have enough memory??!,I guess