I use pytorch1.2 train the code, there are some errers in yolo_layer.py

xuezhongcailian commented 4 years ago

hi, this can use pytorch1.2 to train?

xuezhongcailian commented 4 years ago

one of the variables needed for gradient computation has been modified by an inplace operation

motokimura commented 4 years ago

Hi, I have never tried torch 1.2 in this repo. Can you try with torch 1.0.0 as written in requirements.txt?

Or maybe you can avoid that error by replacing in-place operations in yolo_layer.py.

milliema commented 4 years ago

May I ask which line is the inplace operation that need to be modified? When I train the model on my own data, error occurs "IndexError: index 76 is out of bounds for dimension 3 with size 76"

motokimura commented 4 years ago

@milliema your error seems to be different from the one caused by in-place operations. Could you show me whole of the error messages? I cannot say anything for sure otherwise.

milliema commented 4 years ago

@milliema your error seems to be different from the one caused by in-place operations. Could you show me whole of the error messages? I cannot say anything for sure otherwise. Thanks for your quick reply! I've modified the code a little bit to be used on my own datasets, the modifications include: 1) change the N_CLASSES in cfg file; 2) modify the train/val data directory following coco format; Then, when I run train.py the 1st error occurs as below: Traceback (most recent call last): File "train_am.py", line 237, in main() File "train_am.py", line 185, in main loss = model(imgs, targets) File "/home/ubuntu/miniconda3/envs/autom/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call result = self.forward(*input, kwargs) File "/media/ubuntu/HDD/project/PyTorch_Gaussian_YOLOv3/models/yolov3.py", line 154, in forward x, loss_dict = module(x, targets) File "/home/ubuntu/miniconda3/envs/autom/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call result = self.forward(input, kwargs) File "/media/ubuntu/HDD/project/PyTorch_Gaussian_YOLOv3/models/yolo_layer.py", line 188, in forward obj_mask[b] = 1-pred_best_iou File "/home/ubuntu/miniconda3/envs/autom/lib/python3.6/site-packages/torch/tensor.py", line 325, in rsub return _C._VariableFunctions.rsub(self, other) RuntimeError: Subtraction, the - operator, with a bool tensor is not supported. If you are trying to invert a mask, use the ~ or bitwise_not() operator instead.

Then I change the code "obj_mask[b] = 1-pred_best_iou" in yolo_layer.py to "obj_mask[b] = ~pred_best_iou", the 2nd error occurs as below: Setting Arguments.. : Namespace(cfg='config/automotive_default.cfg', checkpoint=None, checkpoint_dir='checkpoints', checkpoint_interval=1000, debug=False, eval_interval=4000, n_cpu=0, tfboard_dir=None, use_cuda=True, weights_path='/media/ubuntu/HDD/project/PyTorch_Gaussian_YOLOv3/gaussian_yolov3_coco.pth') train_am.py:57: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. cfg = yaml.load(f) successfully loaded config file: {'MODEL': {'TYPE': 'YOLOv3', 'BACKBONE': 'darknet53', 'ANCHORS': [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], 'ANCH_MASK': [[6, 7, 8], [3, 4, 5], [0, 1, 2]], 'N_CLASSES': 2, 'GAUSSIAN': True}, 'TRAIN': {'LR': 0.001, 'MOMENTUM': 0.9, 'DECAY': 0.0005, 'BURN_IN': 1000, 'MAXITER': 500000, 'STEPS': '(400000, 450000)', 'BATCHSIZE': 4, 'SUBDIVISION': 16, 'IMGSIZE': 608, 'LOSSTYPE': 'l2', 'IGNORETHRE': 0.7, 'GRADIENT_CLIP': 2000.0}, 'AUGMENTATION': {'RANDRESIZE': True, 'JITTER': 0.3, 'RANDOM_PLACING': True, 'HUE': 0.1, 'SATURATION': 1.5, 'EXPOSURE': 1.5, 'LRFLIP': True, 'RANDOM_DISTORT': True}, 'TEST': {'CONFTHRE': 0.8, 'NMSTHRE': 0.45, 'IMGSIZE': 416}, 'NUM_GPUS': 1} effective_batch_size = batch_size iter_size = 4 16 Gaussian YOLOv3 Gaussian YOLOv3 Gaussian YOLOv3 loading darknet weights.... /media/ubuntu/HDD/project/PyTorch_Gaussian_YOLOv3/gaussian_yolov3_coco.pth using cuda loading annotations into memory... Done (t=0.05s) creating index... index created! loading annotations into memory... Done (t=0.00s) creating index... index created! evaluating... obj_mask torch.Size([4, 3, 76, 76]) 0 tensor(0) 0 0 obj_mask torch.Size([4, 3, 76, 76]) 1 tensor(0) 0 76 Traceback (most recent call last): File "train_am.py", line 237, in main() File "train_am.py", line 185, in main loss = model(imgs, targets) File "/home/ubuntu/miniconda3/envs/autom/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call result = self.forward(*input, kwargs) File "/media/ubuntu/HDD/project/PyTorch_Gaussian_YOLOv3/models/yolov3.py", line 154, in forward x, loss_dict = module(x, targets) File "/home/ubuntu/miniconda3/envs/autom/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call result = self.forward(input, kwargs) File "/media/ubuntu/HDD/project/PyTorch_Gaussian_YOLOv3/models/yolo_layer.py", line 200, in forward obj_mask[b, a, j, i] = 1 IndexError: index 76 is out of bounds for dimension 3 with size 76

Do you have any ides about the error? Thanks for your help.

motokimura commented 4 years ago

IndexError: index 76 is out of bounds for dimension 3 with size 76 The 3rd dimension represents x-index on the feature map. In your dataset, some of the boxes might be located (partially) outside of the image. Is it possible to clip those boxes to the image in your dataset?

motokimura / PyTorch_Gaussian_YOLOv3

I use pytorch1.2 train the code, there are some errers in yolo_layer.py #11