xingyizhou / ExtremeNet

Bottom-up Object Detection by Grouping Extreme and Center Points
BSD 3-Clause "New" or "Revised" License
1.03k stars 172 forks source link

terminate called without an active exception Aborted (core dumped) #30

Open MichaelCong opened 4 years ago

MichaelCong commented 4 years ago

python train.py ExtremeNet loading all datasets... using 4 threads loading from cache file: ./cache/coco_extreme_train2017.pkl loading annotations into memory... Done (t=12.73s) creating index... index created! loading from cache file: ./cache/coco_extreme_train2017.pkl loading annotations into memory... Done (t=12.93s) creating index... index created! loading from cache file: ./cache/coco_extreme_train2017.pkl loading annotations into memory... Done (t=10.87s) creating index... index created! loading from cache file: ./cache/coco_extreme_train2017.pkl loading annotations into memory... Done (t=15.55s) creating index... index created! system config... {'batch_size': 24, 'cache_dir': './cache', 'chunk_sizes': [4, 5, 5, 5, 5], 'config_dir': './config', 'data_dir': './data', 'data_rng': <mtrand.RandomState object at 0x7f87c7ffa480>, 'dataset': 'MSCOCOExtreme', 'decay_rate': 10, 'display': 5, 'learning_rate': 0.00025, 'max_iter': 250000, 'nnet_rng': <mtrand.RandomState object at 0x7f87c7ffa4c8>, 'opt_algo': 'adam', 'prefetch_size': 10, 'pretrain': './cache/CornerNet_500000.pkl', 'result_dir': './results', 'sampling_function': 'kp_detection', 'snapshot': 50000, 'snapshot_name': 'ExtremeNet', 'stepsize': 200000, 'test_split': 'testdev', 'train_split': 'train', 'val_iter': 100, 'val_split': 'val', 'weight_decay': False, 'weight_decay_rate': 1e-05, 'weight_decay_type': 'l2'} db config... {'ae_threshold': 0.5, 'aggr_weight': 0.1, 'border': 128, 'categories': 80, 'center_thresh': 0.1, 'data_aug': True, 'gaussian_bump': True, 'gaussian_iou': 0.7, 'gaussian_radius': -1, 'input_size': [511, 511], 'lighting': True, 'max_per_image': 100, 'merge_bbox': False, 'nms_algorithm': 'exp_soft_nms', 'nms_kernel': 3, 'nms_threshold': 0.5, 'output_sizes': [[128, 128]], 'rand_color': True, 'rand_crop': True, 'rand_pushes': False, 'rand_samples': False, 'rand_scale_max': 1.4, 'rand_scale_min': 0.6, 'rand_scale_step': 0.1, 'rand_scales': array([0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2, 1.3]), 'scores_thresh': 0.1, 'special_crop': False, 'suppres_ghost': True, 'test_scales': [1], 'top_k': 40, 'weight_exp': 8} len of db: 118287 start prefetching data... shuffling indices... start prefetching data... start prefetching data... shuffling indices... shuffling indices... building model... module_file: models.ExtremeNet start prefetching data... shuffling indices... total parameters: 198531504 loading from pretrained model loading from ./cache/CornerNet_500000.pkl setting learning rate to: 0.00025 training start... 0%| | 0/250000 [00:00<?, ?it/s] Traceback (most recent call last): File "train.py", line 225, in train(training_dbs, None, args.start_iter, args.debug) File "train.py", line 159, in train training_loss = nnet.train(*training) File "/home/rencong/ExtremeNet/nnet/py_factory.py", line 81, in train loss = self.network(xs, ys) File "/home/rencong/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(input, *kwargs) File "/home/rencong/ExtremeNet/models/py_utils/data_parallel.py", line 66, in forward inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids, self.chunk_sizes) File "/home/rencong/ExtremeNet/models/py_utils/data_parallel.py", line 77, in scatter return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim, chunk_sizes=self.chunk_sizes) File "/home/rencong/ExtremeNet/models/py_utils/scatter_gather.py", line 30, in scatter_kwargs inputs = scatter(inputs, target_gpus, dim, chunk_sizes) if inputs else [] File "/home/rencong/ExtremeNet/models/py_utils/scatter_gather.py", line 25, in scatter return scatter_map(inputs) File "/home/rencong/ExtremeNet/models/py_utils/scatter_gather.py", line 18, in scatter_map return list(zip(map(scatter_map, obj))) File "/home/rencong/ExtremeNet/models/py_utils/scatter_gather.py", line 20, in scatter_map return list(map(list, zip(*map(scatter_map, obj)))) File "/home/rencong/ExtremeNet/models/py_utils/scatter_gather.py", line 15, in scatter_map return Scatter.apply(target_gpus, chunk_sizes, dim, obj) File "/home/rencong/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 89, in forward outputs = comm.scatter(input, target_gpus, chunk_sizes, ctx.dim, streams) File "/home/rencong/anaconda3/lib/python3.6/site-packages/torch/cuda/comm.py", line 148, in scatter return tuple(torch._C._scatter(tensor, devices, chunk_sizes, dim, streams)) RuntimeError: CUDA error: invalid device ordinal (exchangeDevice at /opt/conda/conda-bld/pytorch_1550802451070/work/aten/src/ATen/cuda/detail/CUDAGuardImpl.h:28) frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x6d (0x7f8821feb69d in /home/rencong/anaconda3/lib/python3.6/site-packages/torch/lib/libc10.so) frame #1: + 0x4f223c (0x7f881f16d23c in /home/rencong/anaconda3/lib/python3.6/site-packages/torch/lib/libtorch_python.so) frame #2: + 0x5fc38e (0x7f87fbb9638e in /home/rencong/anaconda3/lib/python3.6/site-packages/torch/lib/libcaffe2.so) frame #3: + 0x739e55 (0x7f87fbcd3e55 in /home/rencong/anaconda3/lib/python3.6/site-packages/torch/lib/libcaffe2.so) frame #4: at::TypeDefault::copy(at::Tensor const&, bool, c10::optional) const + 0x74 (0x7f87fbe4f204 in /home/rencong/anaconda3/lib/python3.6/site-packages/torch/lib/libcaffe2.so) frame #5: at::native::to(at::Tensor const&, at::TensorOptions const&, bool, bool) + 0xc6d (0x7f87fbc327fd in /home/rencong/anaconda3/lib/python3.6/site-packages/torch/lib/libcaffe2.so) frame #6: at::TypeDefault::to(at::Tensor const&, at::TensorOptions const&, bool, bool) const + 0x2c (0x7f87fbe0bcbc in /home/rencong/anaconda3/lib/python3.6/site-packages/torch/lib/libcaffe2.so) frame #7: torch::autograd::VariableType::to(at::Tensor const&, at::TensorOptions const&, bool, bool) const + 0x19c (0x7f87fe532e1c in /home/rencong/anaconda3/lib/python3.6/site-packages/torch/lib/libtorch.so.1) frame #8: torch::cuda::scatter(at::Tensor const&, c10::ArrayRef, c10::optional<std::vector<long, std::allocator > > const&, long, c10::optional<std::vector<c10::optional, std::allocator<c10::optional > > > const&) + 0x7a8 (0x7f881f183da8 in /home/rencong/anaconda3/lib/python3.6/site-packages/torch/lib/libtorch_python.so) frame #9: + 0x5124de (0x7f881f18d4de in /home/rencong/anaconda3/lib/python3.6/site-packages/torch/lib/libtorch_python.so) frame #10: + 0xfd760 (0x7f881ed78760 in /home/rencong/anaconda3/lib/python3.6/site-packages/torch/lib/libtorch_python.so)

frame #21: THPFunction_apply(_object*, _object*) + 0x6ad (0x7f881ef7482d in /home/rencong/anaconda3/lib/python3.6/site-packages/torch/lib/libtorch_python.so) terminate called without an active exception Aborted (core dumped)
bageheyalu commented 4 years ago

Do you solve this question?

ZHR1997 commented 4 years ago

Do you solve this question?