Open a5372935 opened 4 years ago
anchor_proposal is used to generate initial proposals for the network, before the rroi layer refines the (rotated) bounding boxes
You would get two rotated bboxes on the same target if their IoU < IoU threshold. Try decreasing the ROI IoU threshold
thanks. Then what does match_anchors mean
Is it first match_anchor, then we have to try to train the regression as anchor_proposal
It's been a long time since I last saw the code, but based on the naming, it probably means anchors that have IoU > RPN IoU threshold. These anchors are fed to the RPN regression layer
I need help on my case This is output : config https://drive.google.com/open?id=1AhByUq5SHwmo8xIadWziPG5_UPRR5vU2 My initial config : https://drive.google.com/open?id=1AhByUq5SHwmo8xIadWziPG5_UPRR5vU2 My log : https://drive.google.com/open?id=1HQfS0Fhqf-ABMcQfg9OOGeyOLTcQcett My predict image : https://drive.google.com/open?id=1lupmX2EsgxJ5GA33knmsRusINB8vJ3Do
The loss of my training is already very low, why is the result still so bad, is my parameter tuning bad, or is it not enough training
From the image, the target objects are really small. My guess is that there is a class significant imbalance where there are a lot more invalid region proposals (rotated RPNs) than valid ones. A possible fix is to remove very large anchor sizes (i.e. 256) or really small ones (i.e. 20) that don't fit the objects in the dataset, and start with a simpler model (R-50-FPN). It's generally good to reduce the number of total anchors to i.e. 9-15 anchors.
let me try
@mrlooi Sometime, I got
### File "/home/lab602/桌面/rotated_maskrcnn-master/maskrcnn_benchmark/modeling/roi_heads/maskiou_head/roi_maskiou_feature_extractors.py", line 66, in forward mask_pool = self.max_pool2d(mask) File "/home/lab602/anaconda3/envs/rotated/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, *kwargs) File "/home/lab602/anaconda3/envs/rotated/lib/python3.6/site-packages/torch/nn/modules/pooling.py", line 146, in forward self.return_indices) File "/home/lab602/anaconda3/envs/rotated/lib/python3.6/site-packages/torch/_jit_internal.py", line 133, in fn return if_false(args, kwargs) File "/home/lab602/anaconda3/envs/rotated/lib/python3.6/site-packages/torch/nn/functional.py", line 494, in _max_pool2d input, kernel_size, stride, padding, dilation, ceil_mode) ### RuntimeError: invalid argument 2: non-empty 3D or 4D input tensor expected but got: [0 x 1 x 28 x 28] at /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/generic/SpatialDilatedMaxPooling.cu:37**
why?
The error looks to originate from pooling.py. My guess is that the number of initial proposals were small/empty, and after pooling none of the proposals met the passing criterion (could be IoU with ground truth)
@mrlooi Thank you i understand. And I also want to ask a few questions about RRPN Faster
restore from pretrained_weighs in IMAGE_NET 2020-03-19 10:49:16.049380: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2020-03-19 10:49:16.199289: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-03-19 10:49:16.199738: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5564e3590510 executing computations on platform CUDA. Devices: 2020-03-19 10:49:16.199753: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): GeForce RTX 2080, Compute Capability 7.5 2020-03-19 10:49:16.221193: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3696000000 Hz 2020-03-19 10:49:16.223662: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5564e35fb270 executing computations on platform Host. Devices: 2020-03-19 10:49:16.223733: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0):
, 2020-03-19 10:49:16.224412: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: GeForce RTX 2080 major: 7 minor: 5 memoryClockRate(GHz): 1.8 pciBusID: 0000:01:00.0 totalMemory: 7.76GiB freeMemory: 6.34GiB 2020-03-19 10:49:16.224473: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2020-03-19 10:49:16.230070: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-03-19 10:49:16.230131: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2020-03-19 10:49:16.230159: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2020-03-19 10:49:16.230719: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6162 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080, pci bus id: 0000:01:00.0, compute capability: 7.5) WARNING:tensorflow:From /home/lab602/anaconda3/envs/faster/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. restore model WARNING:tensorflow:From train.py:170: start_queue_runners (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version. Instructions for updating: To construct input pipelines, use the tf.data
module. 2020-03-19 10:49:22.027217: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
When I training RRPN Faster and it got stuck, is there anything I haven't changed?
hmm not sure why but you've posted tensorflow logs
@mrlooi Thank you. How can I solve @a5372935 problem when number of initial proposals are small/empty? I have got same error: RuntimeError: invalid argument 2: non-empty 3D or 4D input tensor expected but got: [0 x 1 x 28 x 28] at /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/generic/SpatialDilatedMaxPooling.cu:37
I would recommend starting with good RPN anchors. Use the vis_rpn_anchors.py file to visualize the anchors for your dataset.
@mrlooi I forgot to ask is the brackets after each loss refer to val_loss?
If I remember correctly, it's the loss for that minibatch.
Actually I had a look again into your log : https://drive.google.com/open?id=1HQfS0Fhqf-ABMcQfg9OOGeyOLTcQcett The loss values in brackets are certainly way too high, the training was unstable and will not work
Yes, the loss for that minibatch is really high, but I think vis_rpn_anchors are all correct. Why is this?
Possibly due to version differences. I used torch 1.0 - 1.1 Or it could be a faulty dataset issue. The default pipeline does not handle missing, faulty or empty groundtruth very well
❓ Questions and Help
Which one is the match_anchor or anchor_proposal that I should care about?
And, why image has more than two bboxes on the same target when using inference_demo.py prediction, how can I make him output only one bboxes