mrlooi / rotated_maskrcnn

Rotated Mask R-CNN: From Bounding Boxes to Rotated Bounding Boxes
MIT License
347 stars 62 forks source link

About vis_rpn_anchors #21

Open a5372935 opened 4 years ago

a5372935 commented 4 years ago

❓ Questions and Help

Which one is the match_anchor or anchor_proposal that I should care about?

And, why image has more than two bboxes on the same target when using inference_demo.py prediction, how can I make him output only one bboxes

mrlooi commented 4 years ago

anchor_proposal is used to generate initial proposals for the network, before the rroi layer refines the (rotated) bounding boxes

You would get two rotated bboxes on the same target if their IoU < IoU threshold. Try decreasing the ROI IoU threshold

a5372935 commented 4 years ago

thanks. Then what does match_anchors mean

a5372935 commented 4 years ago

Is it first match_anchor, then we have to try to train the regression as anchor_proposal

mrlooi commented 4 years ago

It's been a long time since I last saw the code, but based on the naming, it probably means anchors that have IoU > RPN IoU threshold. These anchors are fed to the RPN regression layer

a5372935 commented 4 years ago

I need help on my case This is output : config https://drive.google.com/open?id=1AhByUq5SHwmo8xIadWziPG5_UPRR5vU2 My initial config : https://drive.google.com/open?id=1AhByUq5SHwmo8xIadWziPG5_UPRR5vU2 My log : https://drive.google.com/open?id=1HQfS0Fhqf-ABMcQfg9OOGeyOLTcQcett My predict image : https://drive.google.com/open?id=1lupmX2EsgxJ5GA33knmsRusINB8vJ3Do

The loss of my training is already very low, why is the result still so bad, is my parameter tuning bad, or is it not enough training

mrlooi commented 4 years ago

From the image, the target objects are really small. My guess is that there is a class significant imbalance where there are a lot more invalid region proposals (rotated RPNs) than valid ones. A possible fix is to remove very large anchor sizes (i.e. 256) or really small ones (i.e. 20) that don't fit the objects in the dataset, and start with a simpler model (R-50-FPN). It's generally good to reduce the number of total anchors to i.e. 9-15 anchors.

a5372935 commented 4 years ago

let me try

a5372935 commented 4 years ago

@mrlooi Sometime, I got

### File "/home/lab602/桌面/rotated_maskrcnn-master/maskrcnn_benchmark/modeling/roi_heads/maskiou_head/roi_maskiou_feature_extractors.py", line 66, in forward mask_pool = self.max_pool2d(mask) File "/home/lab602/anaconda3/envs/rotated/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, *kwargs) File "/home/lab602/anaconda3/envs/rotated/lib/python3.6/site-packages/torch/nn/modules/pooling.py", line 146, in forward self.return_indices) File "/home/lab602/anaconda3/envs/rotated/lib/python3.6/site-packages/torch/_jit_internal.py", line 133, in fn return if_false(args, kwargs) File "/home/lab602/anaconda3/envs/rotated/lib/python3.6/site-packages/torch/nn/functional.py", line 494, in _max_pool2d input, kernel_size, stride, padding, dilation, ceil_mode) ### RuntimeError: invalid argument 2: non-empty 3D or 4D input tensor expected but got: [0 x 1 x 28 x 28] at /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/generic/SpatialDilatedMaxPooling.cu:37**

why?

mrlooi commented 4 years ago

The error looks to originate from pooling.py. My guess is that the number of initial proposals were small/empty, and after pooling none of the proposals met the passing criterion (could be IoU with ground truth)

a5372935 commented 4 years ago

@mrlooi Thank you i understand. And I also want to ask a few questions about RRPN Faster


restore from pretrained_weighs in IMAGE_NET 2020-03-19 10:49:16.049380: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2020-03-19 10:49:16.199289: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-03-19 10:49:16.199738: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5564e3590510 executing computations on platform CUDA. Devices: 2020-03-19 10:49:16.199753: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): GeForce RTX 2080, Compute Capability 7.5 2020-03-19 10:49:16.221193: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3696000000 Hz 2020-03-19 10:49:16.223662: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5564e35fb270 executing computations on platform Host. Devices: 2020-03-19 10:49:16.223733: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): , 2020-03-19 10:49:16.224412: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: GeForce RTX 2080 major: 7 minor: 5 memoryClockRate(GHz): 1.8 pciBusID: 0000:01:00.0 totalMemory: 7.76GiB freeMemory: 6.34GiB 2020-03-19 10:49:16.224473: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2020-03-19 10:49:16.230070: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-03-19 10:49:16.230131: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2020-03-19 10:49:16.230159: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2020-03-19 10:49:16.230719: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6162 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080, pci bus id: 0000:01:00.0, compute capability: 7.5) WARNING:tensorflow:From /home/lab602/anaconda3/envs/faster/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. restore model WARNING:tensorflow:From train.py:170: start_queue_runners (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version. Instructions for updating: To construct input pipelines, use the tf.data module. 2020-03-19 10:49:22.027217: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally

When I training RRPN Faster and it got stuck, is there anything I haven't changed?

mrlooi commented 4 years ago

hmm not sure why but you've posted tensorflow logs

NimaDL commented 4 years ago

@mrlooi Thank you. How can I solve @a5372935 problem when number of initial proposals are small/empty? I have got same error: RuntimeError: invalid argument 2: non-empty 3D or 4D input tensor expected but got: [0 x 1 x 28 x 28] at /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/generic/SpatialDilatedMaxPooling.cu:37

mrlooi commented 4 years ago

I would recommend starting with good RPN anchors. Use the vis_rpn_anchors.py file to visualize the anchors for your dataset.

a5372935 commented 4 years ago

@mrlooi I forgot to ask is the brackets after each loss refer to val_loss?

mrlooi commented 4 years ago

If I remember correctly, it's the loss for that minibatch.

Actually I had a look again into your log : https://drive.google.com/open?id=1HQfS0Fhqf-ABMcQfg9OOGeyOLTcQcett The loss values in brackets are certainly way too high, the training was unstable and will not work

a5372935 commented 4 years ago

Yes, the loss for that minibatch is really high, but I think vis_rpn_anchors are all correct. Why is this?

mrlooi commented 4 years ago

Possibly due to version differences. I used torch 1.0 - 1.1 Or it could be a faulty dataset issue. The default pipeline does not handle missing, faulty or empty groundtruth very well