princeton-vl / CornerNet-Lite

BSD 3-Clause "New" or "Revised" License
1.78k stars 431 forks source link

There was a problem during training #124

Closed float4189 closed 4 years ago

float4189 commented 5 years ago

I trained my own data set, but I got an error, the error is as follows:

shuffling indices... shuffling indices... shuffling indices... shuffling indices... shuffling indices... shuffling indices... shuffling indices... shuffling indices... 0%| | 0/500000 [00:09<?, ?it/s] Traceback (most recent call last): File "train.py", line 252, in main(None, ngpus_per_node, args) File "train.py", line 233, in main train(training_dbs, validation_db, system_config, model, args) File "train.py", line 165, in train training_loss = nnet.train(training) File "/home/hyb/MyFile/MyCode/CornerNet-Lite1/core/nnet/py_factory.py", line 93, in train loss = self.network(xs, ys) File "/home/hyb/anaconda3/envs/hyb_CornerNet_Lite/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call result = self.forward(*input, kwargs) File "/home/hyb/MyFile/MyCode/CornerNet-Lite1/core/models/py_utils/data_parallel.py", line 70, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/home/hyb/MyFile/MyCode/CornerNet-Lite1/core/models/py_utils/data_parallel.py", line 80, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/home/hyb/anaconda3/envs/hyb_CornerNet_Lite/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply output.reraise() File "/home/hyb/anaconda3/envs/hyb_CornerNet_Lite/lib/python3.7/site-packages/torch/_utils.py", line 369, in reraise raise self.exc_type(msg) IndexError: Caught IndexError in replica 0 on device 0.* Original Traceback (most recent call last): File "/home/hyb/anaconda3/envs/hyb_CornerNet_Lite/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker output = module(input, kwargs) File "/home/hyb/anaconda3/envs/hyb_CornerNet_Lite/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call result = self.forward(input, kwargs) File "/home/hyb/MyFile/MyCode/CornerNet-Lite1/core/nnet/py_factory.py", line 20, in forward loss = self.loss(preds, ys, kwargs) File "/home/hyb/anaconda3/envs/hyb_CornerNet_Lite/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call result = self.forward(input, kwargs) File "/home/hyb/MyFile/MyCode/CornerNet-Lite1/core/models/py_utils/losses.py", line 134, in forward focal_loss += self.focal_loss(tl_heats, gt_tl_heat, gt_tl_valid) File "/home/hyb/MyFile/MyCode/CornerNet-Lite1/core/models/py_utils/losses.py", line 57, in _focal_loss_mask pos_pred = pred[pos_inds] IndexError: The shape of the mask [8, 8, 64, 64] at index 1does not match the shape of the indexed tensor [8, 80, 64, 64] at index 1**

I look forward to your answer, thank you very much!

liuzhongling commented 4 years ago

@float4189 have solve the poblem? I meet the same

float4189 commented 4 years ago

Yes, I have solved this problem, I am willing to help you.

float4189 commented 4 years ago

Change the 80 classes on coco to your own class. You may have some configuration files not changed