Closed libornovax closed 7 years ago
First change I made was to increase the number of conv layers.
Because this improved the performance a lot even though the number of parameters is not so high and it is still reasonably fast, I will be using this network from now on:
macc3d_0.25_r2_x4
r2 c0.25
conv k3 o64
conv k3 o64
pool
conv k3 o128
conv k3 d2 o128
conv k3 o128
pool
conv k3 o256
conv k3 d1 o256
conv k3 d3 o256
conv k3 d5 o256
conv k3 o256
macc x4
The same network architecture with more conv layers, but increased number of parameters (channels) of each layer.
The performance of the network seems superior, but it takes about 3x longer to train and 3x longer to run during detection. Also the number of parameters is approximately 4x larger (about 50MB). I will not be using this network now, but it is important to remember that adding more parameters improves the performance!
Because from images I saw I had the feeling that the predicted coordinates were not very precise I changed the diff weighting during training to apply the same weight on the coordinate diffs as on the positive probability diffs.
We see that suddenly we achieved similar performance as in the network with more parameters. And visually the coordinates are much better.
I removed trucks from the car dataset, also added flipped images to the training set. Very occluded cars and very truncated cars were also removed from the training set. This is the result.
The learning curve seems to be oscillating much less, however there is a chance the net is overfitted to the training set. Nevertheless, the detected bounding box coordinates are much more precise and coherent.
hi, for 100000iterations, how much time does it cost? @libornovax
@jeannotes depends on your GPU. I think for me it was about 3 days on Tesla K40
I carried one test on 20170318 with the same network as was used for 2D bounding boxes, but the performance was poor. Thus I decided to make some changes - enlarge the network.
This is the second part of the training curve: